Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
April 2004

Volume 10 Number 4

ISSN 1082-9873

Implementing Metadata in Digital Preservation Systems

The PREMIS Activity

 

Brian F. Lavoie
OCLC Online Computer Library Center
<lavoie@oclc.org>

Red Line

spacer

Background

In March 2000, OCLC and RLG announced their shared commitment to encourage the development of infrastructure to support the long-term preservation of digital materials. This commitment was later made concrete with the creation of two working groups, jointly sponsored by OCLC and RLG and comprised of expert participants from a variety of institutional and geographical backgrounds. The first working group was charged with identifying key attributes and responsibilities of trusted digital repositories serving cultural heritage institutions. The second working group was formed to identify and describe metadata necessary to support the digital preservation process. Both groups would address their respective topics through a consensus-seeking process aimed at drawing boundaries around key issues, reviewing existing work, and formulating recommendations and best practices to guide future digital preservation initiatives.

The attributes and responsibilities working group completed its work in May 2002 with the publication of a well-received report summarizing its findings. The preservation metadata working group reported its final results in June 2002; shortly thereafter, both an opportunity and a need emerged to extend the work of this group through some form of follow-on activity. This activity became PREMIS, an expert working group jointly sponsored by OCLC and RLG, focused on the topic of implementing preservation metadata within digital archiving systems.

This article discusses the objectives, current status, and future activities of PREMIS.

The First Preservation Metadata Working Group

In order to understand the rationale underlying the creation of the PREMIS working group, it is necessary to review briefly the activities of the first OCLC/RLG preservation metadata working group. At the time of the group's formation, there was scant consensus within the cultural heritage community in regard to preservation metadata. Many institutions recognized, at least from a conceptual standpoint, the importance of metadata as a component of the digital preservation process; a few had developed "in-house" preservation metadata schemes. But there was little or nothing that bound these schemes, along with accumulated expertise and best practice, into a community-wide articulation of what preservation metadata was, how it supported long-term preservation strategies, and what types information it encompassed.

In light of this, the working group formulated an ambitious agenda: to define preservation metadata, describe its role in the digital preservation process, review and synthesize existing preservation metadata schemes, and most importantly, develop a broadly applicable, comprehensive metadata framework to support the long-term preservation of digital materials.

The working group addressed the first three issues in a white paper published in January 2001: Preservation Metadata for Digital Objects: A Review of the State of the Art. The fourth issue merited a separate treatment. In A Metadata Framework to Support the Preservation of Digital Objects, the group built on the foundation provided by the white paper to formulate a comprehensive description of the types of information needed to support the long-term preservation of digital materials. This description was set forth in a metadata framework which expanded and elaborated on the information model defined in the Open Archival Information System (OAIS) reference model, a de facto standard in the digital preservation community. A set of "prototype" metadata elements was also included, mapped to the various components of the framework.

The preservation metadata framework was an important contribution toward shaping an international consensus on the metadata requirements of archived digital objects and consolidating expertise on the use of metadata to support digital preservation. While not itself a metadata scheme, the framework provides a foundation for developing formal preservation metadata specifications and represents a common departure point for different schema implementations.

Reaction to the preservation metadata framework as it circulated through the community pointed up a sentiment that the framework stopped short of exhausting the limits of productive consensus-building in regard to preservation metadata. In particular, there was scope for establishing further convergence on best practice for implementing metadata in digital preservation systems. In response, OCLC and RLG sponsored the creation of a second working group—the Preservation Metadata: Implementation Strategies working group, or PREMIS—that would use the preservation metadata framework as a starting point for developing guidelines and recommendations for implementing metadata in support of the long-term retention of digital materials.

PREMIS

The new working group was fortunate in that two exceptionally qualified experts—Priscilla Caplan of the Florida Center for Library Automation and Rebecca Guenther of the Library of Congress—were willing to assume the responsibilities of chairing the group. PREMIS was equally fortunate in the caliber of expertise forming the group's membership, which extends to libraries, museums, archives, government, and the private sector, and includes participants from the US, the Netherlands, Australia, Great Britain, Germany, and New Zealand. The PREMIS membership was split into two groups: the working group itself, which would conduct the bulk of the PREMIS activities, and an Advisory Committee, which would be available for periodic consultation and would review and comment on draft documents.

In shaping the agenda for the PREMIS group's activities, two primary objectives became apparent. First, the group would seek to define a "core" set of preservation metadata elements, applicable to a broad range of digital preservation activities. A data dictionary would be developed to provide recommendations and guidelines for applying, populating, and managing the core elements.

The second primary objective would be to identify and evaluate alternative strategies for encoding, storing, managing, and exchanging preservation metadata—in particular, the core metadata elements—within a digital preservation system. PREMIS would also explore opportunities for devising pilot programs to test the group's recommendations in a variety of system environments.

The PREMIS working group organized itself into two subgroups, each focused on one of the two primary objectives. It was decided that Ms. Guenther would lead the subgroup addressing the core elements and data dictionary, while Ms. Caplan would lead the subgroup reviewing and evaluating implementation strategies. The subgroups would conduct their activities separately, but the presence of a number of PREMIS members on both subgroups, as well as periodic combined meetings, ensured ample scope for cross-fertilization and coordination between the two efforts.

PREMIS initiated its activities in June 2003, with an estimated time frame of one year.

Current Status

The end of 2003 marked the approximate half-way point of PREMIS, a period in which significant progress was made toward meeting the two primary objectives.

Between June and December 2003, the Core Elements subgroup conducted 17 conference calls. Discussion was framed by an element-by-element review of the prototype preservation metadata elements recommended in the first working group's framework. Participants compared the framework elements to elements used by their own institutions, and noted similarities and discrepancies. Topics for discussion included whether each element met the group's working definition of "core", and how it might be populated. New elements were also proposed and discussed. The group found that metadata elements for identifying and establishing relationships between archived digital materials warranted particular attention, and generated the widest diversity of opinion. The group continued its review of the framework elements at a face-to-face meeting held at the American Library Association midwinter conference in January 2004.

The decision was made at the outset that detailed technical metadata for digital formats, while part of preservation metadata, was outside the scope of the group's discussion. Individuals with expert knowledge of specific digital formats are better positioned to articulate these metadata requirements. Consequently, the group confined its discussion to high-level technical metadata, applicable across many digital formats, as well as other forms of preservation metadata, such as provenance, preservation actions, and rights management.

In the course of their discussions, the Core Elements group began developing several resources that, when completed, are likely to be of interest to the digital preservation community. A paper was written on alternative strategies for modeling relationships between archived digital objects. In addition, the group is compiling a glossary of digital preservation terms to support its discussions. Final versions of both of these resources will be made publicly available as the group completes its activities.

The Implementation Strategies subgroup began its work by developing a survey aimed at identifying the goals and characteristics of existing and planned digital preservation repositories, with a particular emphasis on how metadata is being used to support repository processes, functions, and policies. Survey questions address topics such as organizational mission, user community, repository services, funding, characteristics of archived content, rights management policies, and of course, metadata use and implementation. The survey was tested among PREMIS working group and Advisory Committee members, prior to its release to the general community in November 2003. Response has been favorable, with dozens of completed surveys received from institutions representing a variety of communities and geographical locations. The survey ended in mid-February.

Preliminary analysis of the survey responses identified a number of key issues impacting, either directly or indirectly, the use of metadata in digital preservation systems. These issues include the extent to which digital repositories are informed by the OAIS reference model; the needs and requirements of repository stakeholders, including depositors and user communities; methods for obtaining metadata for archived digital objects; types of metadata currently used by repositories, with an emphasis on identifying formal schema or standards gaining support in the community; the nature and use of rights management metadata; access mechanisms for archived materials; and strategies for meeting long-term preservation objectives. These and other issues are being considered by the group in light of their implications for developing effective implementation strategies for preservation metadata.

Looking Ahead

The second half of the PREMIS group's collaboration promises to be as active as the first. For the Core Elements group, the focus will be on finalizing its recommended set of core preservation metadata elements, and creating a data dictionary to support their practical use. The group is testing and refining its proposed core elements on a sample set of digital resources. It has also begun drafting the data dictionary, and much of its remaining effort will be focused on completing this document. The group also intends to produce an XML schema to support the final set of core preservation metadata elements. Upon completion of its activities, the Core Elements group will make available various ancillary resources, such as the glossary of digital preservation terms, that were produced during the course of its work.

The Implementation Strategies group will focus on reviewing and synthesizing the results of the digital preservation survey. A paper will be published in the literature summarizing the results. With the survey findings in hand, the group will identify approaches to implementing preservation metadata currently employed in the community; this will permit the group to formulate and carry out more focused studies to evaluate the relative merits of alternative strategies, using the core set of preservation metadata elements as context for this work. The evaluation will be informed by background information supplied by the survey responses, such as descriptions of repository architectures and the attributes of archived digital materials. If time and resources permit, the group will also consider the scope for setting up collaborative pilot programs to test their recommendations.

Community outreach will be an essential component of PREMIS's remaining activities. PREMIS will continue its efforts to engage interested parties in the digital preservation community as work proceeds. For example, a liaison has been established with the newly formed Dublin Core Metadata Initiative Preservation Working Group. As the PREMIS activity nears completion, draft reports summarizing the work of the Core Elements and Implementation Strategies subgroups will be circulated to Advisory Committee members for review and comment. After revisions based on Advisory Committee feedback, the documents will be made available for a period of public review and comment. Following final revisions, the completed reports will be made publicly available on the PREMIS working group Web site.

Conclusion

As cultural heritage institutions, businesses, and government agencies invest more and more resources in building collections of digital materials, the corresponding need to take steps to ensure the long-term persistence of these materials will become more acute. The development of well-understood, sustainable digital preservation strategies will require standards and best practices for the use of metadata in support of long-term preservation objectives. Attaining broad consensus on the scope and function of preservation metadata, and shaping standards and guidelines for its use through collaborative efforts, will enhance the prospects of building a network of trusted, interoperable digital repositories. The PREMIS activity, building on the work of the first OCLC/RLG preservation metadata working group, is an effort to contribute toward this goal.

References

PREMIS Web site: <http://www.oclc.org/research/projects/pmwg/>.

OCLC/RLG Preservation Metadata Working Group Web site: <http://www.oclc.org/research/projects/pmwg/wg1.htm>.

Preservation Metadata for Digital Objects: A Review of the State of the Art: <http://www.oclc.org/research/projects/pmwg/presmeta_wp.pdf>.

A Metadata Framework to Support the Preservation of Digital Objects: <http://www.oclc.org/research/projects/pmwg/pm_framework.pdf>.

Core Elements subgroup Web page; <http://www.oclc.org/research/projects/pmwg/core_elements.htm>.

Implementation Strategies subgroup Web page: <http://www.oclc.org/research/projects/pmwg/implementation.htm>.

Dublin Core Metadata Initiative Preservation Working Group: <http://www.dublincore.org/groups/preservation/>.

Copyright © 2004 OCLC Online Computer Libary Center
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/april2004-lavoie