OCLC Internet Cataloging Project Colloquium
Position Paper

Access to Networked Documents

Catalogs? Search Engines? Both?

by Arlene G. Taylor and Patrice Clemson
University of Pittsburgh


In a study completed in December 1995, we were trying to identify the metadata in use for retrieving electronically-stored information on the networks. We came to the overwhelming conclusion that in order to do an effective search, one must understand the search engine being used, its database size and contents, and its underlying index strategy. Actually, this has always been true, even in the use of print indexes and catalogs. For example, if a user approaches a traditional catalog with a citation to a journal article and attempts to find the author and title of the article, the result will be "no hit." This happens because the user does not understand that the contents of a database called a "catalog" does not include titles of individual articles from journals, but only the titles of the journals themselves. In the same way, each search engine has its own limits on contents and its own set of elements that it will search. In addition, each has its own way of displaying results. The problem is that it is very difficult to determine just what is included. Several search engine beginning pages proudly display "how many," but not "of what." When one does the same search on several search engines, one quickly learns that

Internet search engines have both strengths and weaknesses. Among the strengths are:

Among the weaknesses are:

Understanding these indexes and their respective search features in combination with our own knowledge of information-seeking can improve the quality of our hits. The search for information on data administration is a good example. An initial search on these two terms will return any document containing one or both of these terms. Even in queries conducted on search engines that weight hits according to adjacency of terms this combination can produce too many irrelevant results. A second search on the same terms, enclosed in quotes (if the search engine provides this sort of qualifying device), will produce a list that is smaller and more precise.

Over the last few centuries since the last information explosion crisis (i.e., the invention of printing), we in the library and bibliography field have developed principles of bibliographic control that have proved quite successful in aiding us to organize information so that it can be found again. One of these is that we should be able to design a citation that presents an integrated and intelligible description of a document (see fig. 7). This citation should clearly indicate the document's relationship to its own earlier (or other) manifestations as well as its relationship to other documents in the database. In other words the purpose is to provide enough information to enable the searcher to choose what is going to be relevant and useful to that searcher (see fig. 8). A citation does this by providing:

This is the principle underlying many of our current indexing services (e.g., the H. W. Wilson indexes), as well as library catalogs. AACR2 is a strong standard for description, although it is focused primarily on physical documents; however, one can follow the spirit of AACR2 without necessarily following every rule to the "letter."

We have focused on the issue of cataloging the whole Internet from the outside in. In fact, catalogers have the opportunity to affect the quality of Internet publication from the inside out. We can

Our challenge is to contribute our expertise in information resources management..

We would like to suggest that this is the underlying principle being put to use in creation of what Priscilla Caplan has called "syntax-independent metadata for document-like objects." She identifies "syntax-independent" as meaning that "the data may be represented in USMARC, HTML, SGML, or 'Keyword3Dvalue' pairs." (p. 21) We should recognize that it is the principle developed especially over the last 150 years that we want to defend and to insist be used by organizers of the Internet, not an undying affection for AACR2 per se. Standards of description assure built-in predictability and eliminate most duplication by analyzing the linkages between entities. There are documents of lasting value among the contents of the Internet's amorphous database.These are the documents to which the principle should be applied. Organizers should select the vehicle of choice (MARC, TEI, HTML, etc.); and do it!


Caplan, Priscilla. "You Call It Corn, We Call It Syntax-Independent Metadata for Document-Like Objects." The Public-Access Computer Systems Review 6, no. 4 (1995): 19-23; also found at:



Fig. 1:

Number 11 - 1st relevant hit

Documents 11-20 of about 2000 matching some of the query terms, best matches first.

Mental health links

http:01/30/96/www.geopages.com/Hollywood/1872/jumpsad.html -size 4K - 19 Oct 95

Fig. 2:

SAD SEARCH - Numbers 1-2

Documents 1-10 of about 2000 matching some of the query terms, best matches first.

No Title

8 - Problemas com UTM/SAD-69 Submetido por: Eduardo Burla FURNAS FURNAS CENTRAIS ELETRICAS Data de submissão: Quarta-feira, 5 de Abril de 1995, 15:19 Máquina remota: maestro.dpi.inpe.br ( Produtos afetados: spring Descrição do problema ou...

http:01/30/96/www.inpe.br/spring/bugs/bug.8 - size 454 bytes- 5 Apr 95

No Title

SAD DAY IN TAMPA BAY. MICK IN VALRICO (XFRJ11B@PRODIGY.COM) Saturday, 18-Nov-95 12:15:11 CST. Well we're 18k short, hope everyone will be at the flea market Sunday cuz that is the future- pathetic showing of support and pride for your city-residents and...

http:01/30/96/www.hamline.edu/~sfitchet/bucs/buctalk/256.html - size 526 bytes - 13 Dec 95

Fig. 3:

Search for arlene taylor - 20th hit

Arlene G. Taylor Home Page

Arlene G. Taylor Home Page. Associate Professor Department of Library Science School of Library and Information Science 642 SLIS Building 135 Bellefield Avenue Pittsburgh, PA 15260. TEL: (412) 624-9452 FAX: (412) 648-7001. Curriculum Vitae. Course Syllabi....

http:01/30/96/www2.pitt.edu/~agtaylor/ - size 681 bytes - 1 Dec 95

(Hits 1-19 were sections of this page.)

Fig. 4:

Search for arlene taylor - 27th hit

All My Children Update for Thursday September 7, 1995

Update for AMC Thursday September 7, 1995. By Ashley Lambert-Maberl y. Welcome to the Thursday update! SPERM WAILS. In which Taylor wonders whether Dreck's magic sperm did the trick ... in which Arlene wants Alec to give her more of his special stuff ...in...

http:01/30/96/purplenet.com/soaps/amc/stories/sep95/sep07r.html - size 7K - 13 Sep 95

Fig. 5:

Search for arlene taylor - 30th hit

No title

Source: This information is provided by Rebecca Green, College of Library and Information Services, last revised January, 1994. For further information contact Dr. Green at (301) 405-2050 or electronic mail at rgreen@umd5.umd.edu LBSC 671 Rebecca Green...

http:01/30/96/www.inform.umd.edu:8080/Educational_Resources/CollegesAnd Schools/CLIS /Courses-Spring94/LBSC671-0101/readings

Fig. 6:

Search for arlene taylor - compact listing

Documents 21-30 of about 10000 matching some of the query terms, best matches first.

No Title 090909[21Apr95]                  Date: Sat, 15 Apr 1995 16:54:5
Arlene Coddington 0909[22Nov95]           Arlene Coddington. REALTORAE As
Home Page: Arlene Badman  09[18Jul95]     Jane C. Taylor, M.F.A. Jane re
management_team.html 0909[27Oct95]        SIS2000 Project Management Tea
Student Affairs Projects  09[11Dec95]     Projects - Major Organization.
List of Professional Deve 09[24May95]     The Professional Development M
All My Children Update fo 09[13Sep95]     Update for AMC Thursday Septem
Titles OPQ 090909[08Dec95]                D o c u m e n t s b y T i t l
NUKE Moviescape Reference 09[04Dec95]     Rebecca De Mornay. Filmography
No Title 090909[12Dec95]                  Source: This information is pr

Fig. 7:


To design a citation that :

Fig. 8:


To provide enough information to enable a searcher to choose what is going to be relevant and useful to that searcher

