[PREVIOUS] [CONTENTS] [NEXT]
Issues in Science and Technology Librarianship Spring 1998

Precision Among Internet Search Engines: An Earth Sciences Case Study

Lisa Wishard
Earth and Mineral Sciences Library
105 Deike Bldg.
The Pennsylvania State University
University Park, PA 16802
lar14@psu.edu

Abstract

Three representative queries related to the earth sciences were used to evaluate the precision of 37 Internet search engines. The structure of three main types of Internet search engines and strategies for improving search results are discussed. The sample queries indicated that Go2, InfoMine, and Argus Clearinghouse had the highest precision for catalog type search engines. Excite, Infoseek and Northern Light had the highest precision for keyword type search engines. There was no one best multiple-threaded search engine. Searchers are urged to use multiple tools when using Internet search engines to locate subject specific queries.

Introduction

Earth science information is widely available on the Internet from many state, national, regional, and international institutions and agencies as well as numerous commercial and personal sources. Many of  the agencies, such as the United States Geological Survey and the World Meteorological Organization, maintain their own web sites with search tools for locating information within the site. If, however, one does not know with whom a researcher or research group is associated or know that an institution is studying a particular phenomenon, using a site-specific search tool is not the best approach. In these cases the use of general web-based search engines is needed. This article presents the results of sample earth science-related queries in 37 web-based search engines.  Information is presented on search engine database size, availability of earth science information, as well as an assessment of search engine precision based on three representative queries.

The first part of the paper provides an overview of search engine structure. The second part presents the methodology used in evaluating the search engines. The third section explores the results of some of the sample searches, includes a table that compiles the evaluative information, and discusses strategies which may be useful in locating earth sciences information when using Internet search engines.

I. Search Engine Structure

Web search engines are analogous to an index, directing searchers to occurrences of search terms. Web-based search engines, however, do not point toward terms within a text or a controlled database, but rather to occurrences on the Internet. By virtue of the complexity of web-space, comparing search engines is a challenging affair.

Search engines have developed into three main categories. The first category contains catalog or directory search engines, arranged by subject or material type. Examples include Yahoo!, a subject-based catalog with a keyword search aid; the Argus Clearinghouse, a set of subject-based search engines; DejaNews, a search engine devoted to Usenet information; and Magellan, a subject-based catalog of reviewed web sites. The second category is keyword or crawler search engines. These are indices of Internet material compiled by robot or spider programs. The programs regularly navigate through data tags, links and the text of web pages for new and updated information. Examples of web crawlers include HotBot, which uses a program that indexes web-pages word for word, and Infoseek, which culls information through data tags and links. The third category of search engines are multi-threaded search engines, or meta-crawlers, which search multiple search engine databases concurrently and present the combined results. Examples include MetaCrawler which uses keywords to search six indices concurrently, and Ask Jeeves which uses natural language queries and an expert system to search five keyword search engines concurrently.

Within the three main categories of search engines are cross-over technologies. For instance, most of the catalog and directory search engines have keyword searchable indices in addition to browsable subject trees, like Yahoo!, Galaxy and the Internet Sleuth. Also many keyword or crawler search engines provide hierarchical subject channels to the material in their databases, like Excite, Lycos and Infoseek.

For the most relevant and precise results searchers should be aware of several important criteria. The "help", "how to search" or "about" links on the search engine homepage should help determine the answers to these questions:

  1. How is the database built? Some search engines rely on web page developers to register their sites, others crawl portions of the web to gather and update information. Likewise, the crawler may seek only data tags and hyper-links or may actually peruse the contents of the page.

  2. How large is the database? Size of the database will affect the recall and precision of a search. Also, some search engines, such as Yahoo! and most catalog sites, count only the primary homepage but actually index deeper pages. Other engines count every page.

  3. How current is the database and how often is it updated? Programmers have designed crawlers to work automatically on a regular schedule. The lag time between gathering new information and its inclusion into the search engine may be significant, especially if the search topic is very current.

  4. What search parameters does the engine support? Some offer sophisticated search capabilities, Boolean logic, phrase searching, and proximity, while others do not. One of the major drawbacks in using web-based search engines is the inability to search by fields, like author and title. Nor do many search engines allow searchers to combine sets of results.

  5. How are the search results ranked and displayed? Some engines, like Excite, use concept-based searching, returning results not only for the specified terms, but also for related concepts. Other engines, HotBot for example, return results based on the number of times the search term occurs on a page. Some search engines list only the hyperlinks of the ranked results. Others present brief abstracts or annotations, page size, related links, date of indexing, reviews or author information.

Although a common command language appears to be developing, many variations still exist. For more control over the query and results, searchers should take advantage of the "power" or "advanced" searching capabilities of the various engines. Three web sites which provide excellent comparisons of the major search engines are the Web Matrix prepared by Matt Slot (http://janus.ambrosiasw.com/~fprefect/matrix/) [Note: Link moved; URL changed 4/2/02 by ald]; the Search Engine Watch Web site (http://searchenginewatch.com/) and C|Net's Search Engines: Where to Find Anything on the Net by Andrew J. Leonard (http://www.cnet.com/Content/Reviews/Compare/Search/index.html). [Note: Broken link removed 8/5/98 by ald] These web sites include tables comparing the capabilities, size and popularity of the major search engines. For comprehensive lists of search engines visit the University of North Carolina's Institute of Academic Technology web site (http://www.iat.unc.edu/guides/irg-08.html) [Note: Broken link removed 3/4/01 by ald] as well as Yahoo!'s list of search engines ( http://dir.yahoo.com/computers_and_internet/internet/world_wide_web/searching_the_web/). [Note: Link moved; URL changed 7/24/00 by ald]

II. Methodology

A list of the reviewed search engines are included in Table 1. The table provides the name, URL, size of each engine's database and user notes. It also includes an evaluation of the search engine's precision based upon three sample searches. The three sample queries used in this study were: 1) ENSO (El Niño Southern Oscillation); 2) New Madrid fault zone; and 3) copper production in Brazil. The search suite used in this paper is small in comparison with other search engine precision studies (Leighton & Srivastava, 1997; Tomaiuolo & Packer, 1996; Chu & Rosenthal, 1996; Ding & Marchionini, 1996) but is specific in its focus on earth science related subjects. The sample queries were chosen in order to look at search engine results for keyword, phrase and multiple concept queries.  Simple searches were submitted to the default mode of the tools evaluated. This was done in order to use the search strategies of the tool rather than the strategies of the searcher. Search engines were chosen for this study from the list of search engines at several sites including: Search Engine Watch (http://searchenginewatch.com/), Webster and Paul's 1996 article, "Beyond Surfing: Tools and Techniques for Searching the Web," (http://magi.com/~mmelick/it96jan.htm) and the list of Internet Searching Tools on the Northwestern University Library's (http://www.library.nwu.edu/resources/internet/search/*) and University of North Carolina Institute of Academic Technology's (ald; ** Broken link removed 4/2/02]

Precision was used to measure the usefulness of the search engines, and is based on the ratio of the number of relevant records within the first 10 to 15 records retrieved. This ratio is broken down into three measures: high, average, and low. Search engines which returned relevant, working links to information related to the sample queries within the first 10 to 15 records were given a high rating. Search engines which returned marginal links, (such as information on copper in other countries) were given an average precision rating. Search engines which returned an overwhelming number of inactive and completely unrelated links (such as ENSO as a company name rather than a meteorological phenomena or links containing information about Madrid, Spain) in the first 10 to 15 records were given a low precision rating. The precision rating in the tables is an interpretation of the results rather than an actual statistical measure of the ratio. This study is not a statistical evaluation of the precision of search engine results but rather  an interpretive exploration of tools and their usefulness in the earth sciences.

The results were considered relevant if the information they provided was unique, provided factual data and could be used in a reference transaction. The quality of the returned links were reviewed for their perceived accuracy, authority, coverage, timeliness and uniqueness (Rettig, 1996; Tate & Alexander, 1996).  For instance, pages which provided verifiable facts that were directly related to the query received high ratings. Pages that provided marginal, unverifiable and duplicate information or which required the user to do additional searching were given average ratings. Search engines which returned completely unrelated or inactive links were given low ratings. While other studies have gone to great lengths to take into account the bias of the searcher when determining the relevancy of the returned links (Leighton and Srivastava, 1996), this is an interpretive study in a focused subject area, so no efforts were made to compensate for potential bias by the searcher in evaluating the results.

The size of search engines was also classed into three categories. Big search engines contain over 25 million URLs or web pages, medium sized search engines contain between one million and 25 million URLs or web pages, and small search engines contain less than one million URLs or web pages. Some size figures are approximate.

III. Results

Locating Broad Topics: Catalog or Directory Results

When looking for broad topics, such as meteorology or paleontology, the best places to start are the catalog or directory search engines which arrange links by subject like, Yahoo!, Galaxy or the Internet Sleuth. Each of these tools provide multi-tiered subject trees. For instance, on Galaxy's homepage the searcher can choose from over 11 major categories and over 148 sub-categories. Links to earth science and environmental information can be found in several places such as the "Environment" sub-category under the major categories "Community" and "Law"; "Geography" under the "Social Sciences" category, and "Speleology" under the "Leisure and Recreation" category. The largest number of links for the earth sciences is found under "Geosciences" in the "Science" category. The geosciences page includes five categories for the earth sciences including: "Geochemistry", "Geology", "Marine Geology", "Geophysics" and "Meteorology and Climatology". Following the link for "Meteorology and Climatology" the searcher finds links clustered in the following categories: Academic Organizations, Articles (full-text), Cartography, Collections, Directories, Government Organizations and Organizations. At any time throughout the various levels of the Galaxy engine, the searcher may perform keyword searches of the database, or link to related categories.

Browsing multiple directory or catalog engines can prove frustrating since few of the engines use a controlled vocabulary. The Librarian's Index and WWW Virtual Catalog both categorize data via Library of Congress subject classes. (Not by LC class number however.) Yahoo! developers are proud of their intuitive subject classification scheme and Look Smart proclaims a "16,000-plus subject index."  Catalog search engines limit their usability for serious research, however, by not providing and using name and subject thesauri. This is especially true for science categories which can be difficult to find when browsing or keyword searching. Earth science and other science categories are often hidden under headings such as "education", "reference" or "learning."  If a useful heading is not uncovered after browsing through one or two layers, query the database with keywords. The two most detailed earth science-related subject trees are found at Yahoo! and the WWW Virtual Library

Using the search tools on the catalog search engines can result in greater precision than simply browsing subject categories. In addition these search tools provide greater flexibility for searching the contents of the database, which is very handy in the absence of controlled vocabularies or subject thesauri for the catalog databases. The Go2 search engine had the most precise search tool. (The Argus Clearinghouse and InfoMine tools also had highly precise results for the sample queries. However, when using these tools the searcher needs to be aware that their databases are small and that precision is dependent on whether or not the subject of the query is included in the database. For instance the "ENSO" query had much more precise results in the Argus and InfoMine engines than did the queries on the "New Madrid fault zone" and "copper production in Brazil.")

Locating Specific Information: Keyword or Crawler Results

Focused or very specific information needs may be better served with keyword or multiple-threaded search engines. With these tools, searchers can increase precision by including as many unique search terms as possible. The keyword or crawler-type search engines with the highest precision ratings and the most relevant, working links were Excite and Infoseek

The default Infoseek search for "copper production in Brazil" had low precision while the search on the "New Madrid fault zone" had average precision. (Precision did increase with case sensitive searching, e.g. "New Madrid fault zone" had more precise results than "new madrid fault zone.") The most precise Infoseek search was the keyword search on "ENSO."  The use of Infoseek refinement options did improve results in all searches. For example, the "pipe" command, which looks for related records within a larger set of records, "copper | Brazil" led to more relevant material than the standard search. Following the Infoseek "related topic" links did not locate many additional relevant links.

Excite did better with the phrase and multiple-concept searches than it did with the keyword query.  The sample query on "copper production in Brazil" found the Copper Development Association page, entitled "Copper: Market and Data Statistics," press releases and annual reports from companies with copper mines in Brazil. Excite returned several duplicate links within all the sample queries, for instance in the "ENSO" search the NOAA-CIRES ENSO page appeared under both the http://www.cdc.noaa.gov/enso/index.html URL and the http://www.cdc.noaa.gov/enso/ URL. The "More like this" option in Excite did not retrieve any additional relevant links in any of the sample queries.

Excite and Infoseek results for the "copper production in Brazil" query were similar, primarily pointing to press releases, annual reports and technical reports for mining and production companies. The most consistent false hits for this query were on copper as a dietary mineral supplement. (Surprisingly "brazil nuts" are a good source for copper.) The most common false hits in the "ENSO" query were to companies named ENSO, while the most common false hits in the "New Madrid fault zone" query were links to sites about unrelated seismic zones and faults.

Northern Light is another useful keyword search engine. Northern Light sorts results into folders by domain, and subject. Some of the folders created with the sample search on Brazilian copper production included, "Commercial sites," "Mining industry," "Metals Industry," "Coal" and "Toxicology," among others. Folders for the search on "ENSO" included "Personal pages," "Climatology," and "www.coaps.fsu.edu," among others. Northern Light also searched several online "Special Collection" databases which located journal articles. These articles could be purchased from Northern Light for document delivery fees varying between two to six dollars depending on the length and source of the article. This hybrid of Internet and literature databases is a trend to watch for on the web.

Planet Search, though it earned an average precision rating, had one of the best displays for search results. The Planet Search results include a bar graph depicting the relevance for each search term in the query. The bar graphs show not only the relevance of each term for each record located, but also show the overall results for the entire search. Each record also contains a "Find similar" option, the records date, and the number of words in the record. Planet Search also allows the searcher to create custom directories for search results and bookmarks. Planet Search had many repeat hits within its results such as including mirror sites for the Southern California Alphabetic Fault Index in the "New Madrid fault zone" results and three links to the ENSO Newsletter Homepage in the "ENSO" results.

Lycos did above average with phrase and multiple concept searches, but results for keyword query was low. WebCrawler and Magellan (a catalog/directory-type search engine) had identical results for all three queries. In addition the WebCrawler and Magellan results were the most imprecise of just about any engine used, regardless of type. For instance, the first site listed in the "New Madrid fault zone" query was for a map of Madrid, Spain while the fifth link returned was for ESPN SportsZone: Soccer. HotBot had average precision with the default "all the words" search, but precision did increase slightly when "the exact phrase" mode was used with phrase and multiple concept queries. What-U-Seek had low precision for phrase and multiple concept searches, but had highly precise results for the keyword search on "ENSO." Alta Vista results were average on default searches, but precision did increase slightly with the use of the "refine" option. With Alta Vista the results for all queries contained duplicate links.

Covering a lot of Ground Quickly: Using Meta-Search Engines

Multiple-threaded search engines are becoming more popular and convenient as the size of the web increases. There are, however, several weak links in the use of multiple-threaded search engines. Most notably, multiple-thread engines rely upon and have no control over the comprehensiveness and timeliness of the databases that they search. Also, the multiple-threaded engines send out queries to multiple databases, each of which is constructed and searched slightly differently. These tools usually contain a disclaimer tucked away somewhere in the "about" section indicating that the results from complex search strategies using Boolean, proximity, or other operators cannot be guaranteed. Despite some of these issues multiple-threaded search engines have begun to emerge as definitive resources for searching the web.

Most multiple-threaded search engines had average results as shown in Table 1. There was no one best multiple-threaded search engine that emerged from the sample queries. Rather some engines did better with keyword searches while others returned more useful results with phrase or multiple-concept queries. For instance, Mamma, Profusion and Metacrawler did better with the phrase query for  the "New Madrid fault zone" and the multiple-concept query on "copper production in Brazil." Inference Find and Ask Jeeves had more precise results for the keyword search, "ENSO."

The interface for many of the multiple-threaded search engines allow the user to refine or direct the search at the top level. For example, Metacrawler and Savvy Search allow the user to look for "all" or "any" of their search terms as well as "as a phrase."  ProFusion offers a default mode, a Boolean mode, or a phrase mode, while Mamma allows the user to search for their terms "as a phrase" or to limit the search for their terms to "document titles" only.

A few of the engines like MetaFind and Inference Find cluster the results of the searches by keyword. Other engines, such as Ask Jeeves and Savvy Search, group the results by the tool which returned the link. Most commonly, results are displayed by relevance ranking based on a ratio of where and how often search terms appear.

The Internet Sleuth, a catalog or directory-type search engine, can also be used as a multiple-threaded search engine. The Internet Sleuth homepage provides access to 21 subject categories, which can easily be expanded to show sub-categories. The science category has nine sub-categories including one for "Earth Sciences." The "Earth Sciences" sub-category provides search engines for over eleven different earth science resources such as Volcano World and the SPE Technical Papers Index. While "Earth Sciences" in Internet Sleuth does not yield an exhaustive list, the links provide access to some high-quality full-text resources. This access to subject-based search engines is unique. In addition to the subject-based search engines, the Internet Sleuth homepage also provides the opportunity to search the entire web simultaneously from up to six major search engines (Alta Vista, Excite, Infoseek, Lycos, WebCrawler and Yahoo!.) Searchers can also view multiple reviewed, news, business & finance, software and Usenet engines.

Ask Jeeves which uses natural language queries yielded above average precision with the keyword query on "ENSO" but below average precision with the phrase and multiple-concept queries. Search queries are fed through an expert system which not only suggests alternate strategies to the original search, but also sends the query out to Excite, HotBot, WebCrawler, Alta Vista and Infoseek. The sample query on "ENSO" resulted in the six additional queries in Figure 1.  The alternative search strategies returned were quite relevant to the original query, and provided the user the opportunity to focus the search on a particular aspect of the search term. Ask Jeeves also returned ten resources from each of the five search engines that it queried. The Ask Jeeves results from the engines queried were consistent with the results from the individual search engines (see Table 1).

Figure 1. Ask  Jeeves Expert System Suggested Alternative Queries
What is the latest news coverage on El Nino?
What is an El Nino?
Where can I find information on the 1997-98 El Nino?
What is the latest news coverage on California storms?
Where can I learn about the meteorology topic El Nino?
Where can I find general scientific information on El Nino?

Highway61 had above average precision for phrase and multiple-concept searches. Highway61 sends queries to six search engines: Yahoo!, Alta Vista, Lycos, WebCrawler, Infoseek and Excite. The number of results displayed is determined by the searcher who chooses how long the search engine can look for results as well as the number of results to display. Results on the sample query on "copper production in Brazil" found several unique company reports, and also found the most web sites from the .br (Brazil) domain.

When using keyword and multiple-threaded engines, notice what sections of the pages the engine is searching and develop a precise search statement. The volume of information available on the web necessitates the use of "advanced" or "refine" options for more accurate search results. In addition, searchers should keep in mind the advice offered from an Infoseek tip, "Longer queries work better." Use a series of specific and unique terms for more precise search results. This advice holds true for locating earth science information on the web, as well as any subject specific search.

Conclusion

The results from this study concur with other precision studies in that no one search engine emerges as the most precise for locating information on the World Wide Web. Even with the focus on earth science queries, no one tool emerged preeminent. In their 1996 study, Tomaiuolo and Packer identified Alta Vista as the tool which returned the most number of relevant hits based on the first ten hits retrieved. An evaluation of the precision ratios for the earth science related queries (1), found that Infoseek yielded the most precise results (2) and that Lycos and Alta Vista were significantly less precise. Leighton and Srivastava, whose study employed the most sophisticated methodology, also found that there was no conclusively more precise search engine. However, "a definite pattern emerges. Alta Vista, Excite and Infoseek are always the services with the highest estimated median scores." (1997, http://www.winona.msus.edu/library/webind2/wi2pt3.htm#RESULTS, p 2 of 7). [Note: Link moved; URL changed 8/5/98 by ald] Based on these other published studies and the results in this earth science-related study, searchers trying to locate ANY kind of information on the World Wide Web are advised to use a variety of tools depending on the type of query they are submitting and the type of information that is needed.

As Leighton and Srivastava stated, "True precision, the ratio of relevant elements returned to the total number of elements returned, is too arduous to calculate, because it would mean examining all the links returned by a service, which may number in the thousands or millions." (1997, http://www.winona.msus.edu/library/webind2/wi2pt2.htm#EVALCRIT, p. 3 of 8). [Note: Link moved; URL changed 8/5/98 by ald] Recognizing the limitations of this study, it is hoped that the results can still serve as a guide when using Internet search engines to locate earth science information on the World Wide Web.

Table 1. Search Engines Reviewed
Name and URL Size Precision Notes
Catalog or Directory-type search engines
All in One
http://www.albany.net/allinone/
small average* Common interface to many smaller search engines which user must search one at a time. Not much science.
*Precision varies by tool.
Argus Clearinghouse
http://www.clearinghouse.net/
small high* Reviewed sites. Science links found under the "Environment" heading and the "Math & Sciences" heading which contains an "Earth Sciences" sub-category.
*Only if subject is included in the Clearinghouse.
C|Net's Search.com
http://www.search.com/
big average Site search is powered by Infoseek. Users can choose from 11 search engines when searching the "entire web." There is a "Science" sub-category under the main "Learning" category. Use of "related links" can increase precision.
EINet Galaxy
http://www.einet.net/galaxy.html
small average "Geoscience" sub-category is found under the main category "Science". Found zero hits for phrase and multiple concept queries.
Go2 (formerly the World Wide Web Worm)
http://www.overture.com/ [Note: Go2 is now Overture; URL changed 4/2/02 by ald]
small high 500 categories listed in random order. Provides last crawled date with descriptions. Users can "rate" the located sites.
G.O.D. (Global Online Directory)
http://www.god.co.uk/
small low "Science" sub-category located under the main category "Community and Education".
HandiLinks
http://www.ahandyguide.com:80/
small low There are no science "Hot Areas", but using alphabetic jump bars locates links for subjects like "geology", "meteorology", etc.
Hot Lava
http://hotlava.erupt.com/ [Note: Broken link removed 3/4/01 by ald]
small low* "Earth Sciences" sub-category found under "Health and Sciences" main category. Very small database. Similar to Yahoo!
*Sends keyword queries simultaneously to six search engines, with average precision results.
InfoMine
http://infomine.ucr.edu/ [Note: Link moved; URL changed 7/24/00 by ald]
small high Provides subject, title or keyword access. Descriptions provide links to related sites. "Earth Science" category located in the "Physical Sciences, Engineering, Computing and Math" main category. Searches can be limited to individual categories.
Internet Sleuth
http://www.isleuth.com/ [Note: Broken link removed 3/4/01 by ald]
small average* "Earth Science" category found in the "Science" category which provides links to specialized search engines.
*Keyword queries of the Internet Sleuth database with sample queries resulted in zero hits. Specialized search engine precision results varied by tool.
Librarians' Index to the Internet
http://lii.org/ [Note: Link moved; URL changed 7/24/00 by ald]
small low Uses Library of Congress subject classes. No overall earth science category but there are sub-categories for "Earthquakes", and "Environment". The browsable subject list contains "Geology" as a subject heading, but the category only contains three links.
Look Smart
http://www.looksmart.com/
small average For "Earth and Environment" category look under "Reference and Education" then the "Science and Nature" categories.
Magellan
http://web.webcrawler.com/d/search/p/webcrawler/ [Note: Magellan is now WebCrawler; URL changed 4/2/02 by ald]
small average Subject categories access reviewed sites. Can also do keyword searches of the "entire web." "Science" category contains a "Planet Earth" sub-category for earth science-related links.
Power Search
http://www.power-search.com/ [Note: Broken link removed 8/5/98 by ald]
big** average* Distributed links to over 100 specialized and general search engines. The "Power Search" option inserts the search strategy into the search box for each tool, but searches still need to be completed individually for each tool.
*Precision varies by tool.
**Tools included are in the big range, but the site itself only links to 100 search tools.
SciCentral
http://www.sciquest.com/cgi-bin/ncommerce3/ExecMacro/sci_index.d2w/report [Note: Link moved; URL changed 7/24/00 by ald]
small low Relatively new. Maintained by professionals in the fields covered. "Earth and Space Science" category contains nine sub-categories.
WWW Virtual Library
http://www.vlib.org/ [Note: Link moved; URL changed 7/24/00 by ald]
small average Distributed servers. Geosciences housed at University of Calgary, Meteorology housed at Penn State, etc.
Yahoo!
http://www.yahoo.com/
medium low* Comprehensive. "Earth Sciences" sub-category is located in the main "Sciences" category.
*Sample queries resulted in zero hits in Yahoo! for phrase and multiple concept queries, and found only two (of 49) relevant links for the keyword query. Queries forwarded to Alta Vista yielded average precision.
Name and URL Size Precision Notes
Keyword or Crawler-type Search Engines
AliWeb
http://aliweb.emnet.co.uk/ [Note: Link moved; URL changed 7/24/00 by ald]
small low Archie-like, dynamic indices. Current focus is on academic and technical sites. Search interface provides many search refinement options.
Alta Vista
http://www.altavista.com/ [Note: Link moved; URL changed 10/14/98 by ald]
big average Use of "refine" option clusters results by theme which user can then choose or exclude in order to increase precision. Alta Vista subject channels are based upon the Look Smart database.
Excite
http://www.excite.com/
big high "More like this" links useful for locating related sites. No science categories or sub-categories were found in the Excite channels. "Power Search" option increased precision.
HotBot
http://hotbot.lycos.com/ [Note: Link moved; URL changed 7/24/00 by ald]
big average "Hip Pocket Guide" categories, include an "Earth and Environment" sub-category under the main "Reference and Education" category, and "Science and Nature" sub-category (Similar to Look Smart). Searches can be limited by date, geographic location and domain.
Infoseek
http://www.go.com/ [Note: Link moved; URL changed 12/21/98, 7/24/00, 3/4/01 by ald]
big high "Earth Science" sub-category located under the main "Careers and Education" category, then follow "Fields of Study" to "Science." Results can be refined with new terms. The pipe search looks for narrower terms within a larger concept.
Lycos
http://www.lycos.com/
big average To find "Earth Sciences" in the Lycos subject categories look under the "Education" sub-category, in the main category "Knowledge". Search terms can be limited to titles, URLs and within specified sites.
Northern Light
http://www.northernlight.com/
big high* Access to full-text articles in Special Collections. Description includes creation date.
*Use of custom search folders increased precision.
Planet Search
http://www.planetsearch.com/
big average* Many customization options. Graphic display of search term relevance for each link.
*"Find Similar" option increased precision.
Web Crawler
http://www.webcrawler.com/ [Note: Link moved; URL changed 7/24/00 by ald]
medium average No science-related channels. Supports natural language queries. Can choose from link only or brief summary display.
What-U-Seek
http://whatuseek.com/
medium low* Fast. "Science and Technology" category contains 50 sub-categories.
*Higher precision for keyword searches than for phrase or multiple concept searches.
Name and URL Size Precision Notes
Multiple-threaded or Meta-crawler type search engines (number in parentheses is number of search engines searched)
Ask Jeeves
http://www.askjeeves.com/
big average Searches 5 general Internet search engines. Uses natural language queries. Expert system helps guide searchers to related information. Results from concurrently searched engines similar to "refined" results retrieved in searches of the individual engines.
CUSI - Configurable Unified Search Index
http://cusi.emnet.co.uk/ [Note: Link moved; URL changed 7/24/00 by ald]
medium average* Search by type of search engine (category, keyword, Usenet, etc.) through a common interface. Tools are searched one at a time, but users can choose from over 18 different search engines.
*Results vary by tool.
DOGPILE
http://www.dogpile.com/
big low Searches 14 Internet search engines as well as 5 Usenet, 2 FTP and 3 news search engines. Similar to MetaFind. Searches are automatically configured with commands such as "+new +madrid +fault +zone." Results are clustered by tool which returned link. Duplicates are not removed.
Highway 61
http://www.highway61.com/
big average* Searches 6 Internet search engines. Provides contemplative quotes while waiting for results.
*Phrase and multiple-concept queries yielded higher precision.
Inference Find
http://www.inference.com/infind/ [Note: Broken link removed 3/4/01 by ald]
big average Searches 6 Internet search engines. Clusters results by domain and removes duplicates.
Mamma
http://www.mamma.com/
big average* Searches 6 Internet search engines as well as 5 financial and 5 news search engines. Clusters results by which search engine returned the link.
*Phrase and multiple-concept queries yielded higher precision than keyword queries.
MetaCrawler
http://www.metacrawler.com/
big average* Searches 6 Internet search engines. "Metaspy" link allows users to see what and how other users are searching.
*Phrase and multiple-concept yielded higher precision.
MetaFind
http://search.metafind.com/ [Note: Broken link removed 7/24/00 by ald]
big average Searches 6 Internet search engines. Results can be clustered by keyword, domain or alphabetically. Sorting by domain was often the most useful. Similar to Dogpile. No "stop" words, searched all words, including "in" in sample queries
Profusion
http://profusion.ittc.ukans.edu/
big low Searches 9 Internet search engines. Can choose to limit search to the "3 best" or "3 fastest" search engines available. Offers three search modes: default, phrase or Boolean. Displays results by relevance ranking.
Savvy Search
http://www.cs.colostate.edu/~dreiling/ [Note: Broken link removed 3/4/01 by ald]
big average Searches up to 19 Internet search engines in over 20 languages. Users can integrate results and limit by type of material and domain. Did not remove duplicates.

End Notes

(1) All earth science related questions in the study were keyword or phrase queries. Queries included: avalanches, clean water act, earthquakes, global warming, lightning, natural disasters, ozone depletion, recycling plastic, tectonic plates, tornadoes, volcanoes and watershed planning.

(2) The overall rating for the Tomailou and Parker study found Alta Vista 9.3, Infoseek 8.3 and Lycos 8.1. An analysis of the earth science-related queries, found Infoseek to be 9.5, Lycos 8.7 and Alta Vista 8.3.

References

Chu, Heting and Rosenthal, Marilyn. 1996. Search engines for the World Wide Web: A comparative study and evaluation methodology. [Online.] http://www.asis.org/annual-96/ElectronicProceedings/chu.html [2 April 1998].

Ding, Wei and Marchionini, Gary. 1996. A comparative study of web search service performance. In: American Society for Information Science 1996 Annual Conference Proceedings, 33; Global complexity: Information, chaos and control; Baltimore, Maryland, October 21-24, 1996. (Edited by Steve Hardin), pp. 136-142. Information Today,  Medford, NJ.

Lebedev, Alexander.  17 May 1997. Best search engines for finding scientific information in the web. [Online]. http://www.chem.msu.su/eng/comparison.html [27 November 1997]. [Note: Broken link removed 4/2/02 by ald]

Leighton, Vernon, H. and Srivastava, J. 16 June 1997. Precision among World Wide Web search services (Search engines): Alta Vista, Excite, Hotbot, Infoseek, Lycos. [Online]. http://www.winona.msus.edu/library/webind2/webind2.htm [Note: Link moved; URL changed 8/5/98 by ald].

Rettig, James. 1996. Beyond cool: Analog models for reviewing digital resources. [Online]. http://www.onlineinc.com/onlinemag/SeptOL/rettig9.html [30 April 1998].

Singh, Amarendra and Lidsky, David. 1996. "All-out search." PC Magazine 15(21): 213-249.

Tate, Marsha and Alexander, K. 1996. "Teaching critical evaluation skills for World Wide Web resources." Computers in Libraries 16(10): 49-55.

Tomaiuolo, Nicholas G. and Packer, Joan G. 1996. Quantitative analysis of five WWW "search engines." [Online]. [Note: Broken link to http://neal.ctstateu.edu:2001/htdocs/websearch.html removed 12/21/98 by ald] [1 December 1997].

Webster, Kathleen and Paul, Kathryn. 1996. Beyond surfing: Tools and techniques for searching the web. [Online]. http://magi.com/~mmelick/it96jan.htm [26 November 1997]. [Note: Unable to connect 4/2/02]

FEEDBACK

We welcome your comments about this article. Please fill out this form for possible inclusion in a future issue.

[PREVIOUS] [CONTENTS] [NEXT]

W3C 3.2 
Checked!