OCLC Internet Cataloging Project Colloquium
Position Paper
by
Kevin L. Butterfield
Assistant Librarian,
Original Cataloging Unit,
Hatcher Graduate Library
University of Michigan,
Ann Arbor, MI 48109-1205
Tel: (313) 764-9361
E-mail: kbutterf@umich.edu
The term metadata has been bandied about a great deal
in the professional literature of late. With the advent of the
Dublin Core, the Text Encoding Initiative (TEI) Guidelines, the
Government Information Locator Service (GILS), and other such
constructs, the concept of data about data and what to do with it
has again become a discussion point in librarianship. As the
methods available for describing information grow beyond MARC, it
becomes increasingly apparent that librarians, and more
specifically catalogers, have a role to play as mediators and creators of an
increasingly diverse landscape of descriptive methods. As the
choices for providing access increase, the experience and
traditions that the cataloging profession can bring to the
creation, standardization, and manipulation of metadata systems
becomes obvious. This position paper outlines how that experience
and tradition were brought to bear on digital library projects at
the University of Michigan and advocates a strong and
collaborative role for catalogers in the mediation and creation of
metadata systems.
The Cataloger As Mediator
By mediating the use of metadata, catalogers provide for these developing systems of description and access a strong influence towards standardization. Cataloging exists as an often invisible process of order making. As constituencies develop systems for ordering in the digital world, what was once a largely invisible process becomes glaringly apparent.
A working example of this process at the University of
Michigan has been the collaborative work done for the Humanities
Text Initiative, a partnership between the University Library, the
School of Information and Library Studies and the University of
Michigan Press. Cataloging staff were brought on board to develop
policies and standardize procedures for the cataloging of texts and
the production of TEI headers for the project. The goals were the
development of a continuing dialogue between each of the partners
and the creation of a system of headers and MARC records that
provided predictable and easy access to the Initiatives electronic
texts. The effort is an ongoing collaboration between the
humanities text, preservation, and cataloging communities within
the library. As the effort progressed towards more large-scale
production, skills in UNIX programming, workflow and systems
analysis, and Standardized General Mark-Up Language (SGML)
were developed on the part of the catalogers
while a greater appreciation for the value of standardized forms
of description and access were developed on the part of the other
collaborators. As workflow patterns became finalized, another
change was noticed as well. Increasingly, catalogers were working
with the metadata first, in the form of TEI headers, and deriving
MARC-based records for the local catalog from it. This was made
possible by the efforts of catalogers to standardize the forms of
description within the TEI tagging. By employing AACR2 as a
standard and mediating the development of tag content, catalogers
played an integral role in the development of a more descriptive
and standardized header that allowed for greater predictability and
fidelity in searching and retrieval.
The Cataloger As Creator
While the previous project illustrates the cataloger as mediator, another more expansive project provides an example of the cataloger as an active participant in the creation of a metadata system. In 1994, the University of Michigan was awarded a four-year, four-million-dollar cooperative agreement to conduct coordinated research and development to create, operate, use, and evaluate a test bed of a large-scale continually evolving multimedia digital library. The NSF/ARPA/NASA Digital Library Project is based on a set of highly specialized agents. The descriptive component of that library, the conspectus, consists of a metadata set for the description of objects, collections, and agents within the digital library. In early 1995, members of the University Library's Original Cataloging Unit were asked to join the Conspectus Working Group to provide expertise in the creation of systems for information organization, in the maintenance of large scale information systems and to advise on the ability of metainformational constructs to scale from the research to the production level.
Work has progressed over the last year and will continue to into the future to create and refine the conspectus. Catalogers actively participated in the process of defining an attribute set. The questions asked during this process included for what and for whom are we attempting to describe digital resources and at what level is it appropriate to describe them. An additional area of exploration involved investigating what the best process for registering or cataloging content for the UMDL would be. How much human intervention is necessary and what parts can be handled by intelligent agents? Is it possible to design heuristics to inform an agent where and when to stop cataloging and what information is required from the content provider to make this a workable process? A third component of the process involved the creation of an ontology to describe for the system what it would encounter during searching and for the development of a conspectus search language.
The work to date in answering these questions has resulted
in a core attribute set for the description of collections, a
prototype registry page to facilitate the process of describing
collections, and the identification of aspects of the registry
process which can be done automatically by agents. The
conspectus seeks to go further than simply describing the
bibliographic characteristics of an object or collection to additionally
encompass rights, relationships and services also associated with
the collection, object or agent being described. Each of these
developments came about from a creative, collaborative process
between catalogers from the university library and faculty and
students from the university's schools of engineering and library
and information science. In the ongoing work to create this
metadata component, catalogers have employed talents largely taken
for granted in the areas of description, retrieval and access
as well as developing new skills in ontological description.
(Re) Emergence of Metadata
As text publishing models increasingly incorporate
electronic access and delivery into their paradigm, it becomes
clear that metainformation/metadata becomes included in the
editorial decisions involved in the creation of the texts. This is
a shift from the old model of simply publishing the text and
leaving the creation of metadata description in the hands of
outside agencies, such as libraries or, more specifically,
catalogers. This shift of work from the bibliographic control level
to the publishing level is a natural evolution of text production
in the electronic information age. Whereas texts previously were
gathered into the library and placed under bibliographic control
via catalogs, MARC-based or otherwise, now more and more we are
providing to clients actual texts whose revision exists beyond the
control of the physical library. In addition, through initiatives
such as OCLC's Dublin Core and the TEI, among others, we are
accessing texts that contain self-describing metainformation. We
are now pointing to data in motion, objects that we do not
physically hold, whose description resides as a part of the object,
rather than separately in a library catalog. Metadata itself is not a new
thing. Catalogers have been employing it as a descriptive method for decades
as MARC records in OPACs or as cards in catalogs.
What is new about metadata today are the emerging multitude of methods
which employ it and the arena in which it is being used. TEI, GILS and Dublin
Core metadata each arose from a different community or as a collaboration
of communities in order to attempt to describe a very slippery
publication medium. It is not unlike the chaotic times when printing was first
invented. We are grappling with an emerging and mutable publication medium for
which we have few definitive answers because we have not discovered all of
the questions yet.
Return to Cataloging As a Process
The opportunities for catalogers in these developments begin with a return to viewing cataloging as a process. The experiences that catalogers have taken from the digital library projects at the University of Michigan illustrate this. From both of the projects discussed earlier in this paper, catalogers took away broadened perspectives on new methods of information organization, fresh looks at old assumptions involving information retrieval and organization and a return to a basic understanding of the relationship between access, description and retrieval. The underlying theme behind each of these lessons has its emphasis on cataloging as a process. Too often we have viewed cataloging as having its culmination in a record while the largely invisible intellectual process of creating standards, vocabularies, and systems of description and classification that are inherent in that record are taken for granted.
Because of this invisibility, many of those creating
libraries and systems for the new digital order assume that they
are approaching these issues for the first time. By approaching
these developments from a collaborative perspective, not only do
librarians bring to the table our experience in creating
metainformational systems, but gain new insight and experience in
the processes employed by other disciplines. As these digital
endeavors become more and more global, the cataloger's experience
in languages, diacritics, and standard making become more and more
valuable. When catalogers become involved in these creative
projects, once invisible processes take on clearer form and
purpose. When this happens, both the individual cataloger and the
cataloging profession benefit greatly.
Conclusion
As the extent of electronic publishing grows, metadata
structures increasingly assume a central focus in describing text
and items. Developers of digital libraries often do not realize
that librarians have been there before. A role for libraries in
providing description and access for these resources will lie with
the creation, maintenance and scalability of such metadata
descriptions. By bringing these new metadata discussions into
libraries and, more specifically, into the roles of catalogers, it becomes
clear that the future of universal access will not lie with one universal
system of description. The purpose is not to replace existing systems
or ignore a century of tradition in description, but to continue to build
upon that knowledge in the creation of new metainformational systems, to
impart insight into the creation, maintenance and scalability
of metadata through collaborative efforts, and to bring expert knowledge of
the cataloging process and the traditions of the cataloging community to the
discussions regarding metadata and standards for the digital future.
Acknowledgments
Many people in the Monographs Division here at the Hatcher
Graduate Library have been very helpful in the creation and
presentation of this paper. I would like to thank Jo Ann Lewis,
David Richtmyer, Judith Ahronheim, and Lynn Marko for their time and
patience. I would also like to thank Erik Jul and OCLC for
sponsoring the Intercat project and colloquium.
Bibliography
Caplan, Priscilla. 1995. "You Call It Corn, We Call It Syntax-Independent Metadata for Document-Like Objects." The Public-Access Computer Systems Review 6, no. 4: 19-23.
Crum, Laurie. 1995. "University of Michigan Digital Library Project." Communications of the ACM 38 April 1995: 63-64.
Levy, David. 1995. "Cataloging in the Digital Order." Proceedings of Digital Libraries'95 : The Second Annual Conference on the Theory and Practice of Digital Libraries June 11-13, 1995 - Austin, Texas, USA.
Weibel, Stuart L. 1995. "Metadata: The Foundations of Resource
Description." D-Lib Magazine, July 1995.
http://www.dlib.org/dlib/July95/07contents.html