OCLC Internet Cataloging Project Colloquium
Vianne T. Sha
Timothy B. Patrick
Thomas R. Kochtanek
University of Missouri-Columbia
Vianne T. Sha, M.L.S., currently the Catalog Librarian/Librarian II at the University of Missouri-Columbia, School of Law Library, is the library's local coordinator for the OCLC Internet Cataloging project. Vianne has coordinated, moderated, presented numerous programs in library and law associations, as well as written resource guides and articles on the topics of Internet resources for the library profession and cataloging Internet resources. She is also very actively involved in library professional committees, library-related e-mail discussion lists, and local cooperative projects on digitizing and cataloging Internet resources.
Timothy B. Patrick, Ph.D., Assistant Professor, University of Missouri-Columbia, School of Library and Informational Science, and Medical Informatics Group. Currently, Timothy is studying complex information retrieval and controlled vocabulary research under the Missouri Integrated Advanced Information Management Systems grant. Recent efforts include work on a High Performance Computing and Communications grant and the construction of an integrated workstation for immunology research and education.
Thomas R. Kochtanek, Ph.D., is currently Chair, Department of Information Science, School of Library and Informational Science, at the University of Missouri of Columbia. His research emphasis and publications record encompass the areas of information retrieval, integrated online library systems, and digital libraries. He is a member of the American Society for Information Science, the Association for Computing Machinery, and the American Library Association.
This paper recommends that librarians provide leadership and guidance for local efforts by groups within an institution or community who want to provide World Wide Web (WWW) access to some set of information materials. Librarians should assist local efforts to identify materials of significant research and educational value, to digitize those materials, to catalog or index those materials, and to provide WWW access through Online Public Access Catalogs (OPACS) and national bibliographic utility union catalogs. Furthermore, librarians should assist local agencies in the maintenance of the selected resources, their catalog records, and their associated hyperlinks. Only then will the distribution of those materials to local and remote users be guaranteed to be comprehensive, enduring, integrated, and of consistently high quality.
The traditional library has had overwhelming success in meeting the information needs of the public by its approaches to selecting, acquiring, organizing, distributing, and preserving information resources. The World Wide Web (WWW) and the underlying National Information Infrastructure (NII) provide dramatic new opportunities to serve those information needs. The potential of the WWW/NII as the basis for a wide-area, distributed storehouse of information offers an irresistible challenge to the traditional library. Development of the WWW/NII as an information resource must not, however, be predicated on a rejection of the contributions and lessons of the traditional library.
To fully utilize information resources, one must be able to identify the particular information desired and determine how it can be accessed. It does not matter what formats the information resources are stored in. Library Online Public Access Catalogs (OPACs) provide public access to information resources in a variety of formats. Cataloging records in MAchine Readable Format (MARC) offers a flexible yet highly organized structure for storing and retrieving the bibliographic data of information resources (Sha 1995). The WWW and the underlying NII have made possible one-stop information shopping. From a single point of access, a user can search multiple library catalogs, and is able to access information resources in various physical formats located locally and around the world.
Despite the overwhelming promise of the WWW/NII, two fundamental problems must be addressed to deliver information resources effectively to end-users. These two problems are the maintenance problem and the access problem. Librarians can help resolve these problems by continuing to pursue, in a digital environment, the traditional goals of selecting, acquiring, organizing, distributing, and preserving information resources. We think the best way to approach these tasks is to establish local models for digitizing, organizing, and providing access to information resources.
Two Basic Problems
Maintenance has a two-fold meaning in this paper: the maintenance of current records of the physical locations of information resources, and the maintenance of the authentic quality of information resources. Maintaining current records of the physical locations of information resources is here a problem of maintaining current Uniform Resource Locators (URLs). Maintaining the authentic quality of information resources is here the problem of controlling information sources for currency, endurance, and reliability.
In addition to the problem of maintenance, we must also address a fundamental problem of access. Public users encounter problems in accessing information on the WWW/NII due to limited access to equipment, security problems, restricted access to resources, and temporary system shutdowns. Access must be addressed in order for the WWW/NII to be fully developed as a systematic, orderly public source of information.
Maintenance: Physical Locations
Agencies who try to organize and provide WWW access to information resources will find that maintenance of these remote and unstable resources is very difficult. Location information, represented by Uniform Resource Locators, is always subject to change. Some resources may even disappear after existing on the WWW for a while. This uncertainty tends to undermine the research and scholarly values of these resources.
1. Web Page Maintenance
Various methods have been used to tackle this problem. One of them is to use Web page maintenance tools, such as WebWatch, MOMspider, Netscape SmartMarks, Netbuddy, Red Alert, Webxref, etc., to check the status of URLs periodically. These tools are very useful and extremely helpful for authors of Web pages, who can check and then make corrections of their own Web pages. The resolution of URL maintenance problem, however, should not rely exclusively on these tools. Web tools check the status of the URLs periodically only for the registered or assigned Web pages. They cannot correct the problems of URL maintenance in a global sense.
2. URN Resolution
Another recommended method is the standardization of the Uniform Resource Name (URN) resolution protocols. URNs are one kind of the WWW/NII resource identifiers. "URNs fit within a larger Internet information architecture, which in turn is composed of, additionally, Uniform Resource Characteristics (URCs), and Uniform Resource Locators (URLs). URNs are used for identification, URCs for including meta-information, and URLs for locating or finding resources" (Sollins 1994, 1).
A number of schemes have been proposed recently for assigning and resolving URNs and for associating meta-information with URNs. Several of them are the x-dns-2 URN scheme (Hoffman 1995), the Handle system (Arms 1995), the OCLC URN services (Shafer 1995), the Path URN specification (LaLiberte 1995), and the Virtual Shelf (Patrick 1995). The first four schemes were proposed in response to the Internet Engineering Task Force (IETF) Request For Comments (RFC) 1737 (Sollins 1994).
The x-dns-2 URN scheme uses the domain name system to resolve URNs and thus allows "URN publishers to create URN resolvers without central registration, and without changing or adding any domain names" (Hoffman 1995). This scheme contains the SchemeID (such as URN) and the ElementID in its syntax. The ElementID includes an AuthorityID, which is the domain name of the resolving host, and the RequestID, which identifies the request being sent to the resolving host.
The Handle system "provides identifiers for digital objects and other resources in distributed computer systems. These identifiers are known as handles" (Arms 1995). This system is composed of naming authorities, handle generators, global handle server, local handle servers, caching handle servers, client software libraries, proxy servers, and administrative tools.
The OCLC proposed URN services "emphasizes the role of a URN as a location-independent pointer to an electronic object that has potentially many resolution possibilities, depending on the services requested" (Shafer 1995). This scheme proposes some possible services using the same URN syntax that consists of some variation on name authority/opaque string. It allows the resolution of URN to exactly one URL, the resolution of URN to a list of URLs extracted from a URC, the resolution of URN to a complete URC, and the resolution of URN to a standard citation format.
The Path URN specification uses existing Domain Name System (DNS) technology "to resolve a path into sets of equivalent URLs, and then one URL is resolved into the named resource" (LaLiberte 1995). The Path is a sequence of components and an optional opaque string. A naming authority assigns a unique name for each resource. When a request is sent in the form of a Path, it will be resolved into an ordered list of URL sets from most-specific to least-specific. Then, the list of URL sets will be resolved to the named resource.
"A virtual shelf is a general purpose server that is dedicated to a particular information subject class. The identifier of a virtual shelf server identifies its subject class. A virtual shelf may be used to access a specific information source or to browse for information sources by subject. Location independent call numbers are assigned to information sources. These call numbers are mapped to the location independent identifiers of general purpose servers. The location independent server identifiers are mapped to network locations. The general purpose servers at those locations are virtual shelves" (Patrick 1995).
The common point of these schemes is their proposals to construct URNs based on the concept of location independent, authority name assignment, and resolution servers. Each resource will be assigned an authority name (or number). When a request in URN form is sent, the resolution server will check the URN with the name authority. If it has a match, the URLs will be returned to the request sender. When the location of the resource changes, the request sender will only see the current URLs for the resource.
The concept of URN authority name assignment is very similar to the concept of establishing an authorized form of name or subject in library OPACs. However, a URN does not contain different forms of the same name as the authority record in a library OPAC does, but rather contains different locations (URLs) and meta-information for the authority name. Maintaining the URNs will be a natural extension of the library catalog maintenance functions.
The problem of URN resolution is that it may slow down the retrieving process of the WWW/NII resources since each request has to go through the resolution process before the URLs of these resources can be returned to users. Moreover, none of the existing URN resolution proposals is entirely satisfactory. They have touched but left some unanswerable questions. Who should provide the resolution service? The naming authorities? Or a third-party commercial service? Should there be a centralized resolution server or many local resolution servers?
OCLC has recently implemented a new tool in the InterCat Cataloging project to keep track of the URLs in the catalog records. The primary characteristic of this tool, the Persistent Uniform Resource Locator (PURL), is that each PURL is uniquely associated with only one URL. OCLC has assigned the authority and responsibility for maintaining PURLs to the cataloging libraries. This can be considered a stepping-stone in implementing the quality control process forWWW/NII resources through library cataloging.
One more major step will need to be taken to resolve the problem of redundant URLs, which occur when the same resource is physically stored in different locations. In the long run, a standard URN resolution server, which can resolve a single URN to all associated URLs, and which can return different URLs based on geographical locations of the requestor when an URN is matched, is the best way to accomplish this step.
Maintenance: Quality Control
Web tools and URN resolution protocols may help resolve and maintain current physical location information of the WWW/NII resources. However, the issue of quality control of information distributed on the WWW/NII remains unsolved. Note that controlling the quality of information does not imply censorship. Information resources should be stable and systematically organized to ensure the quality of information distributed. High-quality information means that the resources have gone through the process of quality control. They have been identified, organized, and stored in a systematic and consistent way, and access to them has been provided in a reliable, user-friendly retrieval system. All of these functions have been performed on various formats of information resources by libraries for a long time. Information resources available in digital environments will just be an additional format.
Organizing information resources with quality control addresses three important issues: currency, endurance, and reliability.
To identify a current resource for public access, it is important to ensure that the URL points to the most current version of the resource. URN resolution ensures that current URLs are returned to request senders. But it does not guarantee the resources the URLs link to are the most up-to-date versions of the resources.
The endurance of information resources is an essential component of the value of information. Any published information should be assumed to be read, printed, and cited for references by other users. Therefore, all versions of the electronic resources should be kept in archives.
Quality control is also needed to ensure the reliability of retrieval results. Users do not expect to type in many different forms of names or subjects to retrieve all titles by the same author or with the same topics. That's why authority control on access points is necessary for library catalogs. For the same reason, access points for WWW/NII resources should go through the authority control process.
To implement quality control on WWW/NII resources globally is difficult. There are too many sources of information, too much information, a lack of standards, and no national agency to enforce standards. However, the quality control process can be started with local resources byestablishing local models.
Libraries, which have long performed the roles of information mediator among different groups based on the existing resource sharing structure in the national and international scenes, are the very best candidates for pursuing the cooperation with local efforts to establish the local model.
1. Guarantee Public and Educational Access
It is a fact that not everybody in our society, or in many under-developed countries, can afford the equipment required to access WWW/NII information resources. A survey showed that only 8.4% of African Americans and 18.3% of whites have computers at home (Putt 1995, 6). Other surveys have shown that most Internet users are college-educated adults and have higher-than- average incomes (Brightman 1995, 15). The computer have-nots need encouragement and assistance to use these resources. Libraries must guarantee public and educational access when individuals cannot afford their own computer equipment and Internet accounts. Libraries must also provide assistance to patrons who need help in locating information.
2. Information Security
Many digitized collections accessible through the WWW/NII have ownership problems, thus creating access restrictions. Libraries have to work with local agencies and other groups to resolve the copyright or license problems and ensure access for local patrons and remote users for educational purposes.
3. Well-defined Environment for Each Patron
Public access needs to be in a well-defined environment. A good access control system can keep the shutdown time low, implement timeouts when no activity has been detected for a certain amount of time, provide user-friendly instructions, save users the trouble of getting lost wandering on the WWW/NII, and reduce the chances that one user who gets stuck will waste other users' time and opportunity to access the server. All of these can be done in the library OPAC environment.
4. Extend Library Resource Sharing to The WWW/NII Resources
Resource sharing among libraries is a success story. The recently implemented OhioLINK model (Kohl 1994) is a sample model for resource sharing among regional libraries. Its shared electronic information system is designed to "maximize the use of materials in the state and provide a platform to access and deliver new electronically based information resources" (Sanville 1995, 22). It provides resource sharing in the areas of union catalog holdings, ILL, commercial online databases, and other technologies with cooperative state funding. WWW/NII resources will be another area for development and inclusion in this model. Libraries that cooperate through this kind of model can guarantee public access for the resources owned (such as locally digitized materials) and made available for access (such as subscribed commercial databases) by these libraries.
Establishing Local Models
The development of Web-interfaced OPACs has given libraries a chance to study the feasibility of establishing local models for creating, maintaining, and providing access to local resources on the WWW/NII through a controllable environment. Cooperation among libraries and local agencies is needed to develop this model. Each local model may be different. However, each model should address the basic issues, including the identification of collections to be digitized and organized, the education of caretakers of the collections on the issues of maintenance and access problems on the WWW/NII (including why library OPACs will be the best choice for providing access to the collections), and how the local model can provide expert guidance in digitizing, cataloging, maintenance, and ensuring access to the collections.
The traditional library structure is not dying. It fits well into the WWW/NII when librarians realize how these two structures can be mutually beneficial. Librarians can contribute in resolving the existing maintenance and access problems on the WWW/NII through continuously practicing their traditional information roles. Librarians should be actively involved in establishing a local cooperative model that can guarantee the distribution of comprehensive, enduring, integrated, and high quality information to both local and remote public users.
Arms, William, David Ely. 1995. The Handle System. IETF URI Working Group Internet-Draft (Expires 23 Dec. 95) ftp://ds.internic.net/internet-drafts/draft-ietf-uri-urn-handles-00.txt
Brightman, Joan. 1995. Mystery Guests. American Demographics 17(8):14-16.
Hoffman, Paul, Ron Daniel. 1995. x-dns-2 URN Scheme. IETF URI Working Group Internet-Draft (Expires 21 Oct. 95) ftp://ds.internic.net/internet-drafts/draft-ietf-uri-urn-x-dns-2-00.txt
Kohl, David. 1994. OhioLINK: a Vision for the 21st Century. Library Hi Tech 48:29-34.
LaLiberte, Daniel, Michael Shapiro. 1995. The Path URN Specification. IETF URI Working Group Internet-Draft (Expires 17 Jan. 96) ftp://ds.internic.net/internet-drafts/draft-ietf-uri-urn-path-01.txt
Madsen, Mark. 1995. A Critique of Existing URN Proposals. IETF URI Working Group Internet- Draft (Expires 19 Jan. 96) ftp://ds.internic.net/internet-drafts/draft-ietf-uri-urn-madsen-critique-00.txt
Patrick, Timothy B., Gordon K. Springer, J. A. Mitchell, Mary E. Sievert. 1995. Virtual Shelves in the Digital Library: a Framework for Accessing Networked Information Sources. Journal of the American Medical Informatics Association 2(6):383-390.
Putt, Kirk A., Peter Valdes-Dapena. 1995. Movin' into the fast lane on the information superhighway. The Crisis 102(3):6-8.
Sanville, Thomas J. 1995. Moving Toward the Statewide Virtual Library. Catholic Library World 66(1):22.
Sha, Vianne T. 1995. Cataloguing Internet Resources: the Library Approach. The Electronic Library: the International Journal for the Applications of Technology in Information Environments 13(5):467-476.
Shafer, Keith E., Eric J. Miller, Vincent M. Tkac, Stuart L. Weibel. 1995. URN Services. IETF URI Working Group Internet-Draft (Expires 1 Jan. 96) ftp://ds.internic.net/internet-drafts/draft-ietf-uri- urn-resolution-01.txt
Sollins, Karen, Larry Masinter. 1994. Functional Requirements for Uniform Resource Names. IETF Network Working Group Request for Comments 1737 ftp://ds.internic.net/rfc/rfc1737.txt
Back to Top