OCLC Internet Cataloging Project Colloquium
Position Paper

Modifying Cataloging Practice and OCLC Infrastructure for Effective Organization of Internet Resources

by
Ingrid Hsieh-Yee
Assistant Professor
School of Library and Information Science
Catholic University of America
Washington, D.C.
hsiehyee@cua.edu

Contents


Introduction

The dynamic nature of Internet resources poses a major challenge for information organizers. A Web page may be moved within a Web site, to a different Web site, or disappear entirely; the contents of the page may be updated slightly or substantially over a short period of time; different groups of people may be responsible for a Web page as people come and go; the corporate sponsorship may change; and new access modes may be added after the initial appearance of a Web page. The lack of a standard for presenting information on the World Wide Web further complicates information organizers' tasks. For instance, the statement of responsibility may be vague or embedded, the title can be nondescriptive, and publishing information often needs to be inferred. These challenges prompt us to reconsider how Internet resources can be best represented and organized.

Why Libraries and Catalogers Should Do It

The need for information organization on the Internet is attested by the gallant efforts of various researchers to bring it under control. Most of the existing search engines were developed by computer scientists, and the initial excitement generated by such tools has led some users to jump to the conclusion that the problem of information organization on the Internet is solved. But after the novelty effects have worn off, users have soon realized the imprecision of these tools, and designers such as Martijn Koster[1] have enumerated their limitations and cautioned against total reliance on these tools. Another major attempt has been to identify core elements that will constitute the metadata of an Internet resource, describing it and making it accessible. "Semantic header,"[2] "Dublin Core,"[3] and "Uniform Resource Characteristics"[4] are some of the better known examples which result from the deliberations of computer scientists, librarians, publishers, vendors, archivists, and engineers. It is somewhat puzzling that few catalogers were involved in these efforts, because catalogers have long been involved in creating metadata for information-bearing entities (this activity has been called "cataloging" in the library profession). The parallel between these new metadata specifications and cataloging records is hard to miss. As the efforts to map TEI header into and out of MARC format suggest[5], cataloging standards and practices have much to contribute to the organization of Internet resources.

The principles of information organization[6] include

  1. The determination of what resources exist and selection of resources relevant to user needs
  2. The description of selected resources
  3. The provision of access points,including authority control of access points
  4. The analysis of the content of selected resources
  5. The provision of information for locating these resources

Libraries are better suited than search engines or Internet search services in selecting resources because they have had experience in acquiring materials of various formats for their local users. Librarians' expertise in resource selection and their relatively well-defined local constituencies will ensure their success in evaluating and selecting Internet resources. Catalogers, in particular, should be involved in organizing Internet resources because they have applied these principles to the cataloging of materials in various formats and should be able to apply these principles to the cataloging of Internet resources with equal efficiency.

How to Catalog Internet Resources

So how should this be done? From cataloging 160 Internet resources within a two-month period, we realized that full-level cataloging was very time-consuming, and that some data elements specified by the second level description of AACR2R may be of limited use to searchers. It became clear that to adapt to the dynamic nature of Internet resources, a different level of description should be derived from current standards. To strike a balance between speed of record creation and quality of record, and between speed of record creation and speedy access to Internet resources, an augmented minimal level cataloging standard is proposed. This modified standard (referred to from now on as the "M" level cataloging) aims to fulfill the roles of the catalog as a finding tool, an evaluating tool, a collocating tool, and a locating tool by including only data elements essential for the identification and subject collocation of Internet resources. The M level cataloging follows the AACR2R punctuation pattern and contains the same eight areas specified for the second level description. However, several elements have been simplified. The problems and solutions (P&S) below explain how some data can be recorded.

Description

Area 1: Title and statement of responsibility

Prescribed sources of information: The latest title frame or screen.

The transcription of this title proper is exactly the same as prescribed by 1.1B1 of AACR2R. The source of the title should be included in a note to help users understand the record.

P&S: If the statement of responsibility is not prominently listed on the chief source, it can be omitted from Area 1. Avoid using bracketed information for this element. Instead, use a general note to include author information uncovered elsewhere in the document.

The so-called "Web masters" who provide markup for a document should be treated as editors. List them in Area 1 if prominently listed on the chief source. The AACR2R rule of three applies to authors as well as editors.

General material designation: A new term, Internet file, is used to distinguish Internet resources from computer files stored on carriers for direct access. Users can use this new GMD to qualify their searches to Internet resources.

Area 2: Edition

Prescribed sources of information: The screens before and after the body of the document (similar in concept to "preliminaries" and "colophon" of monographs.)

P&S: Internet resources often do not include formal edition statements. This area can be omitted if no information is available. If they are listed, edition statements tend to appear on the first few screens or the last screen of an Internet resource. Follow 1.2B1 to record a formal edition statement.

As several Intercat participants pointed out, if each update statement is treated as an edition statement, a new bibliographic record will need to be created each time a document is updated.[7] The confusion resulting from such a practice will defeat the purpose of cataloging. Therefore, in contrast to rules in AACR2R, Ch. 9, and the rule in the Cataloging Internet Resources: A Manual and Practical Guide[8], updating information should not be treated as edition information. Such information can be presented as a general note (see discussion below).

Area 3: File characteristics

The terms specified in AACR2R for this area do not provide unique information for identifying or searching Internet resources. This area is therefore omitted from the M level cataloging.

Area 4: Publication, distribution, etc.

Prescribed sources of information: The screens before and after the body of the document (similar in concept to "preliminaries" and "colophon" of monographs.)

P&S: Publishing information often needs to be inferred from the host sites, and sometimes the information listed on the title screen is different from the host site. For instance, an author affiliated with the Catholic University of American may have a Web page at the University of North Carolina's Web site. A simple solution is to take the host site as the publisher and optionally list the author's affiliation in a general note. The rationale for this treatment is that the host provides a forum for the author to present his or her ideas. If publishing information does not appear in the prescribed sources, it should be bracketed.

Date of publication: Since many Internet resources are frequently updated, the date of update is unstable and should not be used as the date of publication[9]. Following the treatment of looseleaf publications, it is recommended that the beginning date of a Web document, if known, should be recorded with a hyphen to indicate that it is still being published. If the beginning date is not in the prescribed sources, it should be bracketed.

Area 5: Physical description

This area is omitted for remote resources.

Area 6: Series

Prescribed sources: The entire document. Record only the formal series title and numbering information.

Area 7: Notes

Prescribed sources: The entire document

Area 7 provides valuable information that supplements information recorded in the first 6 areas of the record.

P&S: The notes should be listed in their order of importance as follows

7.11 856 field: Electronic location and access

Since users are most concerned about accessing a document, this note should be the first note.

7.12 Multiple 856 fields: Hotlinks should be provided for various modes of access.

7.13 Maintenance of URLs: URLs sometimes change or move, their maintenance can be done through OCLC's Persistent Uniform Resource Locators, PURLs[10].

7.21 Source of title: A required note.

7.22 Title variation notes: such as HTML title or source title.

7.31 Author information: If the author statement is not included in Area 1, an author note should be provided if the information can be located elsewhere in the document. Editors and Web masters can be included in the note, which should also include the source of the information. The rule of three applies to authors, editors, and Web masters alike.

7.32 Persons or bodies not transcribed in statements of responsibility but considered important for identification of the document.

7.41 Currency of information[11] : In a manner similar to the treatment of serials and looseleaf publications, add a note to indicate the time the record is created. For instance,

This note will help users understand when the bibliographic record is created and account for any discrepancies between the record and an Internet document the user just retrieved.

7.42 Subsequent update statements: If a cataloger chooses to update a record, the latest update information should be included to help users understand the basis of the change. For instance,

Catalogers who choose not to update a record do not need to include this note.

7.51 System requirements 538: This note can be simplified to include only special hardware or software needed for access. For instance, postscript printer or graphic interface.

7.52 Mode of access 538: This information duplicates access information in 856 and can be omitted.

7.6 Content/summary note: Several catalogers have listed the entries of a document to indicate its content, but a carefully constructed summary note is often more informative than a lengthy list of entries and should be preferred.

Area 8: Standard number and term of availability

Not in use now. Record URN here when it becomes available.

In summary, a bibliographic record for an Internet resource can be as complete as a second-level cataloging record if all the data elements can be easily identified; or it can be as succinct as the record below:

	Title proper  [Internet file] / author statement.  --  [place of host institution :  
          Name of host institution, beginning date of the document-  ]
	Location and access note.  (856 field)
	Source of title note.  (500 note)
	Note on editors or bodies important for identification purposes.  (500 note)
	Currency of information.  (500 note)
	Subsequent update statement.  (500 note)		
	Note on system requirements.  (538 field)
	Summary.  (520 field)

Changes to MARC 500 field for notes: To simplify the maintenance of these notes, indicator values should be designated to specify the nature of each 500 notes. If that is difficult, then other fields should be used to record the currency and update information.

Access Points

Access points should be provided for authors, editors, and bodies (no more than 3 of each type) related to the production of the document and authority work should be performed on all access points.

Subject Analysis

Because many Internet resources tend to be broad in scope, the traditional summational approach seems to work well with this type of resources. Catalogers will need to coordinate the controlled vocabulary in the subject heading fields with the natural language in the summary note to provide users with multiple ways of accessing Internet resources by subject. All subject access points should also be subjected to authority control. Traditional subject heading lists such as LCSH, MeSH, and Sears, and classification schemes such as DDC and LCC should continue to be used to ensure integration of Internet resources with existing collections and to ensure proper collocation by subject.

Using OCLC's Infrastructure

Although it is appealing to provide one information system that organizes everything on the Internet, such a system is neither feasible nor effective because

  1. There are too many resources on the Internet.
  2. The quality of some resources is questionable.
  3. Many resources are ephemeral in nature and may be of limited value to users.

A more cost-effective model would be to have a system that contains resources whose quality has been evaluated. Using the current OCLC infrastructure, libraries can select resources relevant to their local users and contribute original records to OCLC if no records for their selected resources exist. In this process, libraries can take advantage of the LC Name Authority file, Subject authority file, and OCLC record creation module. This practice is essentially the same as the creation of original record on OCLC, except that the descriptive part of cataloging has been substantially simplified.

The cooperative effort of libraries will enable librarians to cover a large number of quality Internet resources. The bibliographic records will have access points and subject information that assist users in searching and selecting Internet resources, and the hotlinks, maintained via PURLs, will easily retrieve selected resources for users. Such an information system will be known for its quality records and retrieval effectiveness, and will therefore be preferred by users. Since the system is potentially profitable, libraries and OCLC may form a partnership in managing the new system. At least three categories of users can be identified and they could be charged according to their relationship with OCLC. First, libraries that contribute original records to the new system would be given proper credit and their access to this system would be substantially discounted. In this way these libraries would have the option of not providing hotlinks from local OPACs. Second, libraries that have not contributed records to OCLC but want to copy records to their local OPACs or search records in this system would be charged a fee. Third, to end users who prefer direct access to OCLC over the Internet, OCLC would charge a fixed fee for a block of searches to encourage use. The system could be placed on the Internet to compete with other search engines and search services, and the revenue could be used to support the system and libraries that contribute records to it.

Conclusion

Internet resources need to be evaluated, selected, described, and analyzed by subject to facilitate access to them. Catalogers have long been adding value to information-bearing entities, and their principles of information organization can be effectively applied to organizing Internet resources. The records for a catalog of these resources should include essential elements for identifying and searching these resources. Because of the vast number of these resources, such a catalog will best be created through a cooperative effort. For that purpose, an augmented minimal level cataloging standard is proposed and the role of OCLC in this endeavor is described. Librarians, with their cataloging expertise, and OCLC, with its information infrastructure, can form a partnership to improve searching, access, and use of Internet resources.

References

  1. Martijn Koster, "Robots in the Web: Threat Or Treat," [http://info.webcrawler.com/mak/projects/robots/threat-or-treat.html]
  2. Binp C. Desai, "WWW Workshop: Navigation Issues," [http://www.cs.concordia.ca/bcd/navigate.html]
  3. For a description of the Dublin Core, see Priscilla Caplan, "You Call It Corn, We Call It Syntax-Independent Metadata for Document-Like Objects," Public-Access Computer Systems Review 6, no. 4 (1995): 19-23; and a summary of the OCLC/NCSA Metadata Workshop at [ http://www.cnri.reston.va.us/home/dlib/July 95/07weibel.html]
  4. Ron Daniel, "Proposed URC External Representation," [http://www.acl.lanl.gov/URI/ExtRep/urc0.html]
  5. See, for instance, "TEI Guidelines for Electronic Text Encoding and Interchange," [http://etext.virginia.edu/TEI.html]
  6. Similar principles were discussed by Arlene G. Taylor in "The Information Universe: Will We Have Chaos or Control?" American Libraries 25 (1994): 629-32.
  7. The issue of web site and edition statement was discussed on the Intercat Listserv in October 1995 by Neil R. Hughes, Angelina G. Joseph, Glenn Patton, and Ellen McGrath. The messages are stored at the listserv archive at http://ftplaw.wuacc.edu/listproc/intercat/9510/maillist.html
  8. Cataloging Internet Resources: A Manual and Practical Guide, ed. Nancy B. Olson. Dublin: OCLC, 1995.
  9. Adele Hallam, Cataloging Rules for the Description of Looseleaf Publications: with Special Emphasis on Legal Materials. 2nd ed. Washington, D.C.: Office for Descriptive Cataloging, Library of Congress, 1989.
  10. The concept of PURL was introduced by Erik Jul and discussed on the Intercat Listserv in January of 1996 by Erik Jul, Vianne Sha, Priscilla Caplan, and Stuart Weibel. The messages are stored at the listserv archive at http://ftplaw.wuacc.edu/listproc/intercat/9601/maillist.html
  11. Hallam, Cataloging Rules for the Description of Looseleaf Publications.
  12. I would like to thank the students in my course on the organization of Internet resources for helping me develop some of the ideas in this paper.

Back to beginning