OCLC Internet Cataloging Project Colloquium
Position Paper
The dynamic nature of Internet resources poses a major challenge
for information organizers. A Web page may be moved within a
Web site, to a different Web site, or disappear entirely; the contents
of the page may be updated slightly or substantially over a short period
of time; different groups of people may be responsible for a Web page
as people come and go; the corporate sponsorship may change; and
new access modes may be added after the initial appearance of a Web
page. The lack of a standard for presenting information on the World
Wide Web further complicates information organizers' tasks. For
instance, the statement of responsibility may be vague or embedded,
the title can be nondescriptive, and publishing information often needs
to be inferred. These challenges prompt us to reconsider how Internet
resources can be best represented and organized.
Why Libraries and Catalogers Should Do It
The need for information organization on the Internet is attested by the gallant efforts of various researchers to bring it under control. Most of the existing search engines were developed by computer scientists, and the initial excitement generated by such tools has led some users to jump to the conclusion that the problem of information organization on the Internet is solved. But after the novelty effects have worn off, users have soon realized the imprecision of these tools, and designers such as Martijn Koster[1] have enumerated their limitations and cautioned against total reliance on these tools. Another major attempt has been to identify core elements that will constitute the metadata of an Internet resource, describing it and making it accessible. "Semantic header,"[2] "Dublin Core,"[3] and "Uniform Resource Characteristics"[4] are some of the better known examples which result from the deliberations of computer scientists, librarians, publishers, vendors, archivists, and engineers. It is somewhat puzzling that few catalogers were involved in these efforts, because catalogers have long been involved in creating metadata for information-bearing entities (this activity has been called "cataloging" in the library profession). The parallel between these new metadata specifications and cataloging records is hard to miss. As the efforts to map TEI header into and out of MARC format suggest[5], cataloging standards and practices have much to contribute to the organization of Internet resources.
The principles of information organization[6] include
Libraries are better suited than search engines or Internet search
services in selecting resources because they have had experience in acquiring
materials of various formats for their local users. Librarians' expertise in
resource selection and their relatively well-defined local constituencies will
ensure their success in evaluating and selecting Internet resources.
Catalogers, in particular, should be involved in organizing Internet
resources because they have applied these principles to the cataloging
of materials in various formats and should be able to apply these
principles to the cataloging of Internet resources with equal efficiency.
How to Catalog Internet Resources
So how should this be done? From cataloging 160 Internet resources within
a two-month period, we realized that full-level cataloging was very
time-consuming, and that some data elements specified by the second level
description of AACR2R may be of limited use to searchers. It became clear
that to adapt to the dynamic nature of Internet resources, a different
level of description should be derived from current standards. To strike
a balance between speed of record creation and quality of record, and between
speed of record creation and speedy access to Internet resources, an augmented
minimal level cataloging standard is proposed. This modified standard
(referred to from now on as the "M" level cataloging) aims to fulfill the
roles of the catalog as a finding tool, an evaluating tool, a collocating tool,
and a locating tool by including only data elements essential for the
identification and subject collocation of Internet resources. The M level
cataloging follows the AACR2R punctuation pattern and contains the same
eight areas specified for the second level description. However, several
elements have been simplified. The problems and solutions (P&S)
below explain how some data can be recorded.
Description
Prescribed sources of information: The latest title frame or screen.
The transcription of this title proper is exactly the same as prescribed by 1.1B1 of AACR2R. The source of the title should be included in a note to help users understand the record.
P&S: If the statement of responsibility is not prominently listed on the chief source, it can be omitted from Area 1. Avoid using bracketed information for this element. Instead, use a general note to include author information uncovered elsewhere in the document.
The so-called "Web masters" who provide markup for a document should be treated as editors. List them in Area 1 if prominently listed on the chief source. The AACR2R rule of three applies to authors as well as editors.
General material designation: A new term, Internet file, is used to distinguish Internet resources from computer files stored on carriers for direct access. Users can use this new GMD to qualify their searches to Internet resources.
Prescribed sources of information: The screens before and after the body of the document (similar in concept to "preliminaries" and "colophon" of monographs.)
P&S: Internet resources often do not include formal edition statements. This area can be omitted if no information is available. If they are listed, edition statements tend to appear on the first few screens or the last screen of an Internet resource. Follow 1.2B1 to record a formal edition statement.
As several Intercat participants pointed out, if each update statement is treated as an edition statement, a new bibliographic record will need to be created each time a document is updated.[7] The confusion resulting from such a practice will defeat the purpose of cataloging. Therefore, in contrast to rules in AACR2R, Ch. 9, and the rule in the Cataloging Internet Resources: A Manual and Practical Guide[8], updating information should not be treated as edition information. Such information can be presented as a general note (see discussion below).
The terms specified in AACR2R for this area do not provide unique information for identifying or searching Internet resources. This area is therefore omitted from the M level cataloging.
Prescribed sources of information: The screens before and after the body of the document (similar in concept to "preliminaries" and "colophon" of monographs.)
P&S: Publishing information often needs to be inferred from the host sites, and sometimes the information listed on the title screen is different from the host site. For instance, an author affiliated with the Catholic University of American may have a Web page at the University of North Carolina's Web site. A simple solution is to take the host site as the publisher and optionally list the author's affiliation in a general note. The rationale for this treatment is that the host provides a forum for the author to present his or her ideas. If publishing information does not appear in the prescribed sources, it should be bracketed.
Date of publication: Since many Internet resources are frequently updated, the date of update is unstable and should not be used as the date of publication[9]. Following the treatment of looseleaf publications, it is recommended that the beginning date of a Web document, if known, should be recorded with a hyphen to indicate that it is still being published. If the beginning date is not in the prescribed sources, it should be bracketed.
This area is omitted for remote resources.
Prescribed sources: The entire document. Record only the formal series title and numbering information.
Prescribed sources: The entire document
Area 7 provides valuable information that supplements information recorded in the first 6 areas of the record.
P&S: The notes should be listed in their order of importance as follows
7.11 856 field: Electronic location and access
Since users are most concerned about accessing a document, this note should be the first note.
7.12 Multiple 856 fields: Hotlinks should be provided for various modes of access.
7.13 Maintenance of URLs: URLs sometimes change or move, their maintenance can be done through OCLC's Persistent Uniform Resource Locators, PURLs[10].
7.21 Source of title: A required note.
7.22 Title variation notes: such as HTML title or source title.
7.31 Author information: If the author statement is not included in Area 1, an author note should be provided if the information can be located elsewhere in the document. Editors and Web masters can be included in the note, which should also include the source of the information. The rule of three applies to authors, editors, and Web masters alike.
7.32 Persons or bodies not transcribed in statements of responsibility but considered important for identification of the document.
7.41 Currency of information[11] : In a manner similar to the treatment of serials and looseleaf publications, add a note to indicate the time the record is created. For instance,
This note will help users understand when the bibliographic record is created and account for any discrepancies between the record and an Internet document the user just retrieved.
7.42 Subsequent update statements: If a cataloger chooses to update a record, the latest update information should be included to help users understand the basis of the change. For instance,
Catalogers who choose not to update a record do not need to include this note.
7.51 System requirements 538: This note can be simplified to include only special hardware or software needed for access. For instance, postscript printer or graphic interface.
7.52 Mode of access 538: This information duplicates access information in 856 and can be omitted.
7.6 Content/summary note: Several catalogers have listed the entries of a document to indicate its content, but a carefully constructed summary note is often more informative than a lengthy list of entries and should be preferred.
In summary, a bibliographic record for an Internet resource can be as complete as a second-level cataloging record if all the data elements can be easily identified; or it can be as succinct as the record below:
Title proper [Internet file] / author statement. -- [place of host institution : Name of host institution, beginning date of the document- ] Location and access note. (856 field) Source of title note. (500 note) Note on editors or bodies important for identification purposes. (500 note) Currency of information. (500 note) Subsequent update statement. (500 note) Note on system requirements. (538 field) Summary. (520 field)
Changes to MARC 500 field for notes: To simplify the maintenance
of these notes, indicator values should be designated to specify the
nature of each 500 notes. If that is difficult, then other fields should
be used to record the currency and update information.
Access Points
Access points should be provided for authors, editors, and bodies
(no more than 3 of each type) related to the production of the
document and authority work should be performed on all access points.
Subject Analysis
Because many Internet resources tend to be broad in scope, the traditional
summational approach seems to work well with this type of resources.
Catalogers will need to coordinate the controlled vocabulary in the
subject heading fields with the natural language in the summary
note to provide users with multiple ways of accessing Internet resources
by subject. All subject access points should also be subjected to
authority control. Traditional subject heading lists such as LCSH,
MeSH, and Sears, and classification schemes such as DDC and LCC
should continue to be used to ensure integration of Internet
resources with existing collections and to ensure proper collocation
by subject.
Using OCLC's Infrastructure
Although it is appealing to provide one information system that organizes everything on the Internet, such a system is neither feasible nor effective because
A more cost-effective model would be to have a system that contains resources whose quality has been evaluated. Using the current OCLC infrastructure, libraries can select resources relevant to their local users and contribute original records to OCLC if no records for their selected resources exist. In this process, libraries can take advantage of the LC Name Authority file, Subject authority file, and OCLC record creation module. This practice is essentially the same as the creation of original record on OCLC, except that the descriptive part of cataloging has been substantially simplified.
The cooperative effort of libraries will enable librarians to cover
a large number of quality Internet resources. The bibliographic
records will have access points and subject information that
assist users in searching and selecting Internet resources, and
the hotlinks, maintained via PURLs, will easily retrieve selected
resources for users. Such an information system will be known
for its quality records and retrieval effectiveness, and will
therefore be preferred by users. Since the system is potentially
profitable, libraries and OCLC may form a partnership in
managing the new system. At least three categories of users
can be identified and they could be charged according to their
relationship with OCLC. First, libraries that contribute original
records to the new system would be given proper credit and their
access to this system would be substantially discounted. In this way
these libraries would have the option of not providing hotlinks
from local OPACs. Second, libraries that have not contributed
records to OCLC but want to copy records to their local OPACs
or search records in this system would be charged a fee.
Third, to end users who prefer direct access to OCLC over the
Internet, OCLC would charge a fixed fee for a block of searches to
encourage use. The system could be placed on the Internet to
compete with other search engines and search services, and the
revenue could be used to support the system and libraries that
contribute records to it.
Conclusion
Internet resources need to be evaluated, selected, described,
and analyzed by subject to facilitate access to them. Catalogers
have long been adding value to information-bearing entities, and
their principles of information organization can be effectively
applied to organizing Internet resources. The records for a catalog
of these resources should include essential elements for identifying
and searching these resources. Because of the vast number of these
resources, such a catalog will best be created through a cooperative
effort. For that purpose, an augmented minimal level cataloging
standard is proposed and the role of OCLC in this endeavor
is described. Librarians, with their cataloging expertise, and OCLC,
with its information infrastructure, can form a partnership to
improve searching, access, and use of Internet resources.
References