Search information systems & search sites by Matt Ottewill

The internet provides a wealth of information but finding it can be problematic. There currently exist a number of sites and technologies to help us. When we enter phrases and keywords into a search site, the list of pages and sites returned will be the result of 2 processes ...

  1. The search technology which powers the site has attempted to produce what it thinks are the most relevant results for us.
  2. A site owner has paid for their site to be listed in the results.

It is not always clear what technologies a given search site uses, and it doesn't matter to the average surfer, but it is important to web developers and site builders.

Search engine

The term "search engine" has become somewhat generic and is used to describe a search site or search technology. Because it is non-specific I will avoid it here.

Search Information Systems

A Search Information System (my term) is the people, hardware and software that comprises an index or directory. Its aim is to index and categorise web sites and their pages and to provide the most accurate and useful results to someone using a Search Site to search the Search Technology System. It is important to distinguish between a Search Site and a Search Information System because some search site companies use other companies Search Technology Systems. More on this later.

There are 2 primary types of search technology system ...

  • Indexes
  • Directories

Search index

A search index comprises several elements ...

  • A database of information about web sites and their content
  • Pro-active software, called spiders (or robots or bots), which searches out and retrieves information contained in web pages
  • Software algorithms which analyse, index and categorise the information the spiders retrieve, and crucially "rank" it in an attempt to provide the most accurate and relevant results for a search
  • An interface (a Search Site) to search the database and display results in the form of brief descriptions and links

In July 2006 the most popular indexes were ...

www.google.com (49.2%)

www.yahoo.com (23.8%)

www.msn.com (9.6%)

In March 2010 ...

www.google.com (85%)

www.yahoo.com (7%)

www.bing.com (3.39%)

These statistics can be found at searchenginewatch.com/

Search directory

Search directories contain information about sites but not the actual content and information itself. They describe and categorise sites. These organisations and companies do not pro-actively find sites to index, they reply on site owners submitting their sites for consideration. People, not spiders, oversee entries in the directory. The important ones are ...

Open Directory Project ... www.dmoz.org

dir.yahoo.com

Search sites

A Search Site provides a web interface to allow you to search a Search Technology System (an index or directory or both). Some Search Sites search the indexes and/or directories they own (such as www.google.com), whilst others are simply interfaces for searching another companies' or organisation's index and/or directory.

Search site Search technology systems used
aol.com Google & Open Directory Project
www.google.com Google & Open Directory Project
www.dir.google.com Open Directory Project
www.earthlink.com Google & Open Directory Project
www.yahoo.com Yahoo
dir.yahoo.com Yahoo
www.dmoz.org (Open Directory Project) Open Directory Project
www.msn.com MSN
www.ask.com / www.askjeeves.com Ask
www.lycos.com Ask
www.excite.com Ask
www.netscape.com Google & Open Directory Project
www.altavista.com Yahoo
www.alexa.com Google

 

Tips on searching search sites