Download software, read reviews, get tips and support.  
Home   Articles   Forum   Reviews   Downloads   About/Contact

  PC-Net's PC News - June, 2001
 
Don Watkins

Dig the Deep Web - Part 1

By Don Watkins

 
 
 

 

June 2001--As early as I can remember I've been interested in machines that could provide answers. After playing with mechanical calculators (sans subtract functions!) and building a binary computer in the 50's I've looked to the day when you could punch in a question and get an answer. 

Evolution

Early search engines were crude, basically databases with keywords. Full text search just wasn't possible given the limits of hardware. Without a global public network results were limited to the local computer/network.

Along came the Internet and the development of Gopher and the ability to locally view text files from all over the world. A new day had dawned.

Search Engines

Search engines were one of the first tools to hit the public Internet. Search engines created "bots" or "spiders" to crawled the web, created a catalog of the results and displayed the results.

Different search engines developed different approaches to the task. Yahoo, AltaVista and Google crawl and index a large portion of the web and respond to key search terms.

Specialized search engines such as www.searchnt.com target specific sites and subjectively index contents to attempt to provide topical results.

Ask Jeeves provides a general search of the Web but allows questions to be asked in a natural language format such as "Where do I find shoes".

And most sites provide a local search similar to the article index here.

Search Engine for Search Engines

With the explosion of multiple search engines and the uneven content between them (see "All Search Engines Are Not Created Equal") shell programs which searched multiple search engines were developed. My favorite is Copernic, which has a free version available and I'm also aware of WebFerret, which frankly never impressed me.

I find the results of these multiple search engines better than what I might get on a single search engine, even my favorites, Google and Northern Light but often I was still disappointed.

There Must Be More

I stumbled onto some research by BrightPlanet, a search technology company of course, that claimed that a "deep Web" existed. The deep Web is the hidden part of the Web, containing content that is inaccessible to conventional search engines and thus to most users. Their research indicates the deep web may contain 550 billion documents, up to 500 times the content of the surface web.

Google has indexed 1.3 billion documents and Northern Light only indexes about 16% of the surface Web's content. Ouch, there's a lot of stuff out there that's not being included.

An article in The Standard points out that the deep Web includes data that people might want if only they knew the URL: Securities and Exchange Commission filings, yellow pages, IBM's patent database, the Merriam-Webster Dictionary and Kelly Blue Book information on automobiles.

Can You Get There From Here?

Like an efficient market there are companies poised to take advantage of the deep web. Intelliseek has created BullsEye and BrightPlanet has created LexiBot to search the deep web.

We'll take on how these guys stack up to a regular search engine or against each other next time.

Until then best in computing.

Go to part 2.

Back to article index 

 
     
   
  PCNet privacy policy    Copyright, 2010. pcnet-online.com