Dublin Core Metadata in the RLG Information Landscape

Willy Cromwell-Kessler
Bibliographic Analyst
Research Libraries Group
Mountain View, California
bl.kes@rlg.org

D-Lib Magazine, December 1997

ISSN 1082-9873

RLG Context

In a December 1996 D-Lib article [1], Ricky Erway described digital initiatives underway at the Research Libraries Group (RLG)[2]. These initiatives reflect the collaborative, educational focus of RLG, a not-for profit membership organization of over 155 research libraries, archives, museums, and other scholarly institutions[3]. Erway outlined the ways RLG is extending its mandate to enhance access to research information into the digital arena.

Among the most pressing issues for digital information providers is ensuring access to stable repositories of standardized information. Erway described some of the responsive RLG member projects to create digital content, and articulate and promulgate digital standards or "best practices." She also discussed the Arches infrastructure[4], RLG's approach to meeting many of the challenges to information management posed by the Internet environment, such as persistent naming, document authentication, licensing, document display, and security.

RLG continues to work on a wide variety of digital initiatives[5], with the goal of providing access to digital content through a Web-based "umbrella interface" that will draw upon all the components of Arches functionality and allow nearly seamless integration of diverse RLG databases as well as a wealth of remote resources. An important component of managing information in a such a distributed but integrated environment is the ability to navigate among different resources that characterize our new information landscape.

To search across resources constructed according to varying descriptive models, it is essential to be able to mediate the differences between those models. To do this, we need to consider the uses of metadata (known, in its conventional library guise, as cataloging data). Traditionally, metadata maps discrete resources or groups of resources. In a new role navigating between groups, metadata holds the key to the integrated information landscape. This article focuses on some of the ways that RLG has worked with its members during the past year to help point metadata development in directions that will assist the research community to integrate access to diverse resources effectively via the Internet.

The Dublin Core

A metadata system that has generated intense interest is the Dublin Core element set[6]. This simple set of fifteen data elements is intended for use in the discovery of Web-based resources. It has the advantage of simplicity and can be applied by non-catalogers, although qualification of the basic elements permits more complex elaboration. The Dublin Core has been collaboratively developed through a series of international workshops sponsored by OCLC Online Computer Library Center, and various other organizations, and enjoys wide ranging support from international communities, including the computing industry and the library world.

A number of factors contribute to the growing importance of the Dublin Core. An infrastructure seems to be firmly in place that can ensure its continuing development and stable maintenance, and a number of test-bed implementations are underway or in the planning stages. Mechanisms have been developed that permit the embedding of the Dublin Core into HTML and also into XML, HTML's probable successor. Additionally, the Dublin Core is being incorporated into Z39.50 protocol applications. The development of a wide range of tools for the Dublin Core's incorporation into the Web environment seems assured if the network of support that it now enjoys continues to grow.

Expanding the Scope of the Dublin Core

For these reasons, it is important for information providers, such as RLG and its membership constituency of research institutions, to assess the potential role of the Dublin Core in the organization of the distributed, cross-domain searching environment we envision. The Dublin Core has been developed specifically as a tool for the discovery of resources created on the Web-- its "strategic application." However, many of the resources of interest to the research community lie outside this intentionally limited scope. For example, while one can access library catalogs or citation databases through the Web, they are not Web-based in the same way that a static HTML Web page is. Then there are Web-based representations of entities that lie outside the Web, such as digital reproductions of printed texts, or scanned images of museum objects. Expanding the scope of the Dublin Core to provide access to the full range of resources that can be discovered via the Web presents some complex issues.

In particular, there is an ambiguity attendant upon Dublin Core description when the resource that is described is a surrogate for another item. It is not always clear whether the associated metadata should describe the surrogate or the original item. Is the researcher who wishes to gain access to a digital version of the 1853 edition of Charlotte Bronte's Villette apt to search for it using the name of Charlotte Bronte, or the name of the person who created the digital version? Is s/he interested in the date that the digital version was created, or the date of the original edition? Of course this dilemma is familiar to library catalogers. Sherry L. Vellucci presented a paper at the recent International Conference on the Principles and Future Development of the Anglo-American Cataloging Rules[7] , summarizing proposals supporting a radical revision of the USMARC catalog record. These proposals address issues similar to those surfacing in debates about the Dublin Core.

Although, as shown, text is susceptible to confusion, the full implications of surrogacy for the Dublin Core first emerged when images were taken into consideration. At the third Dublin Core gathering which was held in September 1996, The CNI/OCLC Image Metadata Workshop[8], the definition of the "document-like-object" (DLO) -- which had been posited as the appropriate subject of Dublin Core description -- was extended to incorporate images. In the process, workshop participants touched on problems inherent in using the Dublin Core to describe surrogates. Since this meeting occurred at a comparatively early stage of development, workshop participants deferred the resolution of the problem, in hopes that experience gained through implementation would lead to a workable solution.

The related problem of granularity also asserted itself with particular force at this time. Aggregates of images or texts might be described as collections, as single entities, or as both. Determining the appropriate level of description and distinguishing between levels of description to insure intelligible assessment of search results can be problematic using the Dublin Core. The solution suggested at the Image Metadata Workshop was to more fully develop the RELATION element.

Proponents of Dublin Core strategic application recognize problems posed by issues of surrogacy and granularity. However, these issues have more immediate significance when the scope of the Dublin Core is extended to non-Web-based resources. This was made clear in a report[9], on the findings of series of workshops sponsored in 1996 and early 1997 by the Arts and Humanities Data Service (AHDS) and the UK Office for Library and Information Networking (UKOLN). This report endorses the use of element qualifiers to avoid ambiguity and refine Dublin Core for domain- specific resources in the arts and humanities. However, although such qualification can be important and useful in domain-specific applications, its use is not without cost. Paul Miller notes in the AHDS/UKOLN report[10] that the greater the degree of qualification, the greater the cost in terms of loss of the interoperability that is so important in the distributed environment.

"Metadata Summit" Meeting in Mountain View

It is against this backdrop that RLG agreed to host and organize an invitational metadata "summit meeting" to discuss the role of the Dublin Core in the distributed, digital information environment. This event grew out of the work of a Task Force on Meta Access[11] charged to investigate issues of electronic access by the Association of Collection Development and Technical Services of the American Library Association. This meeting[12], which took place on July 1, 1997 at RLG's offices in Mountain View, brought together a diverse group of information professionals from the United States and Canada representing academic libraries, national libraries, archives, and the museum and cultural heritage community. Almost all of the participants were affiliated in some way with at least one of various, emergent metadata systems and, consequently, all brought unique and informed points of view to the meeting.

The purpose of the meeting was to provide a venue for the various domain-oriented research communities to explore their commonalities and differences in regard to the way they perceived metadata in the context of the distributed on-line searching environment. The specific focus was deployment of the Dublin Core for the discovery of non-Web-based resources. The day of concentrated discussion was intended to allow participants to contribute to the definition of an action agenda vis-a-vis the Dublin Core that would address the concerns of the research community as a whole.

By the end of the day, most of the participants were able to affirm that:

It is important in these early days of Dublin Core development to give priority to its intended strategic Web application and not divert energies into issues surrounding its extension to non-Web based resources.
Nevertheless, because it is still early days, the communities RLG represents must actively participate in Dublin Core development if it is to realize its potential to address their needs.
Widespread efforts to implement the Dublin Core are desirable to inform its development.
The Dublin Core will not and should not displace richer, domain-based metadata schemes, but it may help to bring extensive traditional "off-line" resources to the attention of Web users.
Although elaborating the Dublin Core by using qualifiers may be valuable in specific contexts, a simpler, "minimalist" application of the Dublin Core is preferable for navigating across resources.
Because of its wide acceptance, the Dublin Core might appropriately serve to mediate between information resources by means of "crosswalks"--mappings to the Dublin Core from other metadata systems. Z39.50 applications were discussed as one mechanism that might facilitate such mediation.
Authoritative registries of such maps between the Dublin Core and other metadata systems are necessary to use it as described above.

The RLG Metadata Working Group Proposal

An important outcome of the summit was the formation of a working group to address "generic" (as opposed to "strategic") application of the Dublin Core. ("Generic" here signifies the use of the Dublin Core as a tool for discovery of resources beyond the HTML Web pages that have been the principal object of the strategic application.). The result of this group's effort was Guidelines for Extending the Use of Dublin Core Elements to Create a Generic Application Integrating all Kinds of Information Resources[13]. These guidelines, coupled with a proposal for some minor changes in the definitions of the DATE and PUBLISHER elements, were submitted for general consideration at the fall 1977 Dublin Core Metadata Workshop.

The RLG guidelines offer a significant taxonomy of issues involved in developing a consistent, simple application of the Dublin Core that will assist in discovery of a full range of resources available via the Internet. At the top level, they delineate two basic ways to use the Dublin Core for discovery of non-Web-based resources. A "lingua franca" approach uses the Dublin Core elements to create Web-accessible common indexes to non-Web-based resources. The most important part of the guidelines lies in its discussion of a second approach, which is to create Web pages solely for the purpose of describing non-Web-based resources, as either individual entities or collections of distinct entities.

The guidelines posit four basic types of resources that may be described by Dublin Core metadata in such Web pages: on-line originals (e.g., D-Lib Magazine), off-line originals (e.g. a first edition of Conrad's Lord Jim), an on-line surrogate (e.g., a scanned version of Conrad's Lord Jim), and an off-line surrogate (e.g., a microform version of Conrad's Lord Jim). They recommend that the descriptions in the Dublin Core elements refer to the to the intellectual content of the original resource. Hence, if one is describing a digital version of Lord Jim, the creator should be Joseph Conrad, not the individual responsible who digitized it. Optionally, one may also include the description for the surrogate. For such circumstances, the guidelines indicated devising a mechanism to distinguish between separate sets of Dublin Core elements; but this distinction is not seen as crucial.

The guidelines also include an alternative approach: describe the surrogate and repeat elements describing the original as qualifiers to the SOURCE element. However, we view this approach as problematic since it risks the frequently noted disadvantage of overloading the SOURCE element and is less transparent.

Finally, the RLG Metadata Working Group's guidelines assume minor changes in the PUBLISHER and DATE elements that should make it possible to apply the RLG guidelines without conflicting with other applications.

The Helsinki Response

The working group prepared the guidelines and recommendations on a tight schedule so that they could be placed on the agenda of the 5th Dublin Core Metadata Workshop[14] which met in Helsinki, Finland, October 6-8, 1997. As noted in an early report[15] on this meeting, prepared by Paul Miller and Tony Gill, the issues relating to surrogacy that were outlined in both the RLG guidelines and the AHDS/UKOLN report were extensively discussed and produced what has come to be known as the "1:1" approach.

There was broad concurrence with much of the RLG proposal, but not with the working group's emphasis on the original resource. Rather, the 1:1 approach endorses the use of separate sets of metadata for each manifestation of an entity being described, all to be linked by the values in the Dublin Core RELATION element. For the RELATION element to perform this role effectively, it must be better defined. Consequently, David Bearman volunteered to form a working group to delineate the relation types that will be necessary for the 1:1 approach to be used.

Not feasible using HTML, the development of the 1:1 approach depends upon the implementation of the Resource Description Framework[16], which has the potential to link between sets of metadata describing a single entity. As an interim measure, pending RDF availability, Helsinki attendees affirmed the alternative noted in the RLG working group's guidelines: base the primary description on the surrogate and use the SOURCE element with qualifiers to describe the original manifestation.

Using this approach, the need for changes in the DATE and PUBLICATION elements is moot. However discussions at the Helsinki meeting did affirm the need for further work on definitions. Other groups will address this.

Next Steps for RLG's Metadata Working Group

RLG is polling its working group participants to determine who is prepared to pursue several further tasks:

First and foremost, the group will reassess the concerns expressed in the guidelines in the light of the forthcoming results from the Dublin Core RELATION (1: l) Working Group. This re-evaluation will be facilitated by the fact that some of the RLG working group are members of the RELATION group.
The guidelines proposed that there should one or more registries of crosswalks between the Dublin Core and other metadata systems. While such registries exist in de facto form -- sites such as those found at OCLC, Library of Congress (LC) and UK Office of Libraries and Networking (UKOLN) offer maintain crosswalks for specific element sets--there is more work to be done. And, of course, there is the perhaps utopian idea of developing the "Crosswalk of all Crosswalks"--which the Getty Information Institute has in fact begun compiling.
Beyond the results of the Dublin Core RELATION Working Group, there are other hierarchy issues. How does one determine if a searching "hit" is a single item, a disparate group of items, or a collection with an integrated identity?
As part of preparing for generic applications, the guidelines need finalizing and examples are called for.

Meanwhile, Back in Mountain View

As RLG pursues its digital initiatives it is keeping the Dublin Core elements in focus, closely monitoring applications such as that proposed in the CIMI Metadata Test Bed[17], which will evaluate the use of the Dublin Core for museum data. A number of RLG projects involving metadata are underway; in these, RLG is carefully considering issues relevant to ensuring compatibility with developments in the wider information community.

RLG has three other member metadata working groups:

a group of preservation experts[18] identifying important information about the nature of a digital surrogate in the USMARC catalog record (paralleling the approach used for microfilm surrogates);
a second preservation group[19] specifying the information elements relevant to preservation needs that should accompany the digital file;
a group of archvists determining what other metadata should be included as documents are created in digital form--for instance, embedding the retention plan in a government agency's email.

In addition to making metadata describing RLG's databases available to Web indexers, RLG's primary implementation is in the next generation of Eureka^®, its user interface for all RLG's resources. In its next stage, Eureka will integrate access to Web- and non-Web-based library, archive, and museum information on central and remote servers, including description of surrogates and original artifacts -- as well as the full surrogates and artifacts themselves. Dublin Core metadata provides the commonality; RLG's participation in the Dublin Core development process promises to yield concrete results for RLG and its members.

References

[1] Ricky Erway, "Digital Initiatives of the Research Libraries Group," D-Lib Magazine (December 1996), http://www.dlib.org/dli b/december96/rlg/12erway.html

[2] Research Libraries Group, http://www.rlg.org.

[3] The 157 Members of the Research Libraries Group, http://www.rlg.org/memlist.html.

[4] Arches -- Archival Server and Test Bed, http://www.rlg.org/strat/projarch.html.

[5] Digital Initiatives at RLG, http://www.rlg.org/digital/index.html.

[6] Dublin Core Metadata Element Set: Reference Description, http://purl.org/metadata/dublin_core_elements.

[7] Sherry L. Vellucci, "Bibliographic Relationships," (paper presented at the International Conference on the Principles and Future Development of AACR2, Toronto,Canada 23-25, 1997), http://www.nlc-bnc.ca/jsc/confpap.htm.

[8] Stuart Weibel and Eric Miller, "Image Description on the Internet," D-Lib Magazine (January 1997), http://www.dlib.org/dlib/january 97/oclc/01weibel.html.

[9] Paul Miller and Daniel Greenstein, eds., Discovering Online Resources across the Humanities: A Practical Implementation of the Dublin Core, 1997, http://ahds.ac.uk/public/metadata/discovery.html.

[10] Paul Miller, "Unifying Resource Discovery Metadata for the Humanities: An Application Based Upon the Dublin Core." Chap. 3 in Discovering Online Resources across the Humanities, http://ahds.ac.uk/public/metadata/disc_17.html.

[11] ALCTS Taskforce on Meta Access Final Report, April 3, 1997, http://www.lib.virginia.edu/alcts/about/final.html.

[12] Willy Cromwell-Kessler and Ricky Erway, Metadata Summit: Meeting Report, 1997, http://www.rlg.org/meta9707.html.

13] Guidelines for Extending the use of Dublin Core Elements to Create a Generic Application integrating all kinds of information resources, Draft, October 1, 1997, http://www.rlg.org/metawg.html.

[14] The 5th Dublin Core Metadata Workshop, Helsinki, Finland, October 6-8, 1997, http://linnea.helsinki.fi/meta/DC5.html.

[15] Paul Miller and Tony Gil, "DC5: The Search for Santa," Ariadne (November 1997), http://www.ariadne.ac.uk/issue12/metadata/.

[16] Resource Description Framework (RDF) Model and Syntax, http://www.w3.org/TR/WD-rdf-syntax/.

[17] John Perkins, CIMI Metadata Testbed Project, Draft, October. 1, 1997, http://www.cimi.org/documents/met projprop.html.

[18] RLG Working Group On Preservation and Reformatting Information, http://www.rlg.org/preserv/pri.html.

[19] RLG Working Group On Preservation Issues of Metadata, http://www.rlg.org/preserv/metadata.html.

The anchor in note 16 was corrected at the request of the author. Editor, December 16, 1997, 9:15 am.

hdl:cnri.dlib/december97-CromwellKessler