|
June 2001--As early as I can
remember I've been interested in machines that could provide
answers. After playing with mechanical calculators (sans subtract
functions!) and building a binary computer in the 50's I've looked
to the day when you could punch in a question and get an answer.
Evolution
Early search engines were
crude, basically databases with keywords. Full text search just
wasn't possible given the limits of hardware. Without a global
public network results were limited to the local computer/network.
Along came the Internet and the
development of Gopher and the ability to locally view text files
from all over the world. A new day had dawned.
Search Engines
Search engines were one of the
first tools to hit the public Internet. Search engines created
"bots" or "spiders" to crawled the web, created a catalog of the
results and displayed the results.
Different search engines
developed different approaches to the task. Yahoo, AltaVista and
Google crawl and index a large portion of the web and respond to key
search terms.
Specialized search engines such
as www.searchnt.com target
specific sites and subjectively index contents to attempt to provide
topical results.
Ask Jeeves provides a general
search of the Web but allows questions to be asked in a natural
language format such as "Where do I find shoes".
And most sites provide a local
search similar to the
article
index here.
Search Engine for Search
Engines
With the explosion of multiple
search engines and the uneven content between them (see "All
Search Engines Are Not Created Equal") shell programs which
searched multiple search engines were developed. My favorite is
Copernic, which has a free
version available and I'm also aware of
WebFerret,
which frankly never impressed me.
I find the results of these
multiple search engines better than what I might get on a single
search engine, even my favorites,
Google and Northern
Light but often I was still disappointed.
There Must Be More
I stumbled onto some research
by BrightPlanet, a search technology company of course, that claimed
that a "deep Web" existed. The deep Web is the hidden part of the
Web, containing content that is inaccessible to conventional search
engines and thus to most users. Their research indicates the deep
web may contain 550 billion documents, up to 500 times the content
of the surface web.
Google has indexed 1.3 billion
documents and Northern Light
only indexes about 16% of the surface Web's content. Ouch, there's a
lot of stuff out there that's not being included.
An article in
The Standard points out that the deep Web includes data that
people might want if only they knew the URL: Securities and Exchange
Commission filings, yellow pages, IBM's patent database, the
Merriam-Webster Dictionary and Kelly Blue Book information on
automobiles.
Can You Get There From Here?
Like an efficient market there
are companies poised to take advantage of the deep web.
Intelliseek has created
BullsEye and BrightPlanet
has created LexiBot to search the deep web.
We'll take on how these guys
stack up to a regular search engine or against each other next time.
Until then best in computing.
Go to part 2.
Back to
article index
|