Stories

D-Lib Magazine
February 1998

ISSN 1082-9873

DC-5: The Helsinki Metadata Workshop

A Report on the Workshop and Subsequent Developments


Stuart Weibel
Senior Research Scientist, OCLC Office of Research
Dublin, Ohio USA
weibel@oclc.org

Juha Hakala
Library Network Specialist, Helsinki University Library
Helsinki, Finland
juha.hakala@helsinki.fi


Dublin Core Metadata Elements:

The metadata elements fall into three groups which roughly indicate the class or scope of information stored in them: (1) elements related mainly to the Content of the resource, (2) elements related mainly to the resource when viewed as Intellectual Property, and (3) elements related mainly to the Instantiation of the resource.

ContentIntellectual PropertyInstantiation
TitleCreatorDate
SubjectPublisherType
DescriptionContributorFormat
SourceRightsIdentifier
Language
Relation
Coverage


Introduction

The Dublin Core initiative is an international and interdisciplinary effort to define a core set of elements for resource discovery. The fifth in the series of Dublin Core Metadata Workshops convened in Helsinki, Finland in October of 1997. Some 70 attendees came from sixteen countries on four continents, representing many stakeholders in various resource description communities.

Effective interchange of resource discovery information requires that there be an underlying architecture that supports conventions for the semantics, structure, and syntax of generalized metadata. The Dublin Core initiative has resulted in consensus concerning a base set of elements for descriptive metadata. The syntactic foundation for Web-based metadata (the Resource Description Framework, [RDF]) is now being developed under the auspices of the World Wide Web Consortium [W3C]. The Dublin Core community is working closely with the RDF community to develop a common architecture to support generalized metadata.

History of the Dublin Core Initiative

The Helsinki Metadata Workshop is the fifth in the Dublin Core Workshop Series. These invitational workshops have convened a broad cross section of librarians, digital library researchers, computer networking specialists, content experts, and museum information specialists in order to promote consensus on a core set of elements to support more effective resource discovery on the Internet.

The first Dublin Core workshop was held in March of 1995, the primary deliverable being a core set of elements judged by the consensus of the group to be a reasonable foundation for the description of electronic resources [DC-1]. The initial workshop focussed on description semantics directed expressly to the problem of discovery of electronic documents.

The second workshop, in Warwick, England [DC-2], broadened the consensus to an international community and broached the syntax problem, making a start at clothing the semantics in a workable syntax for Web-based applications. HTML-embedded metadata has never been the sole target for Dublin Core description, but it was judged essential to support this mechanism in order to promote early deployment, and this strategy has resulted in a broad array of pilot projects in many disciplines [DC-PROJ].

A conceptual foundation for an architecture for metadata was laid at this meeting, the so-called Warwick Framework [WF]. This framework, along with the Meta Content Framework [MCF], formed the nucleus for the development of the Resource Description Framework. RDF promises to provide a flexible syntactic foundation for Web-based metadata including a variety of metadata association models that go beyond embedding descriptive metadata within resources.

The third workshop, attended mainly by image experts, broadened the scope of target resources to include images, acknowledging that though texts and visual resources are different in many ways, the categories for grouping them and describing them are not so very different [DC-3]. A few of the element definitions were revised at this time, and two additional elements were added to the original thirteen.

The fourth workshop convened in Canberra, Australia, in March of 1997. A spectrum of philosophies emerged at this meeting, which can best be described by the ends of a continuum: the minimalists and the structuralists [DC-4]. Minimalists value simplicity above all else, judging that acceptance and interoperability will decline precipitously as the complexity of the standard increases. The structuralists, on the other hand, acknowledge these risks but also feel that the ability to qualify metadata content is essential for any but the simplest of applications. The character of the necessary qualifiers was defined in Canberra, and constitutes one of the major outcomes of this workshop.

Outcomes of the Dublin Core Workshops 1 through 4

Implementation Projects

The most important outcome of the Dublin Core initiative to date is the growing corpus of implementation projects around the world. These projects are found in at least 10 countries and are founded in many different disciplines, including cultural resources, libraries, government information stores, medical and scientific resources, mathematical preprints, to name a few [DC-PROJ].

The geographic scope and diversity of these projects reinforces the expectation that it is indeed feasible to identify a core element set that will serve to unify many disparate resource description communities in the Internet Commons.

The Australian Government Locator Service has recently announced adoption of the Dublin Core as the recommended resource description content standard for electronic government documents [AGLS]. In addition, it is now official policy of the Danish National Library Authority that the Dublin Core shall be the standard for the use of metadata [DEN]. In both cases, the international scope of the Dublin Core initiative has been an important incentive for adoption.

Other metadata initiatives have recognized the Dublin Core effort as well. The Instructional Management System [IMS] project, an effort to provide a resource description standard for educational materials, has adopted the Dublin Core as a base set and added additional elements specific to their needs. A recent report of the Descriptive Metadata Task Group to the National Information Standards Organization-Digital Object Identifier (NISO DOI) workshop [NISO-DOI] has also specified the Dublin Core as a starting place for their metadata work.

A Convention for Embedding Metadata in HTML 2.0

Jump-starting these projects required a syntax that could be consistently deployed on the Web without changes to existing standards and software. A simple convention [HTML-META] for achieving this goal resulted from a workshop on Distributed Searching and Indexing sponsored by W3C in May of 1996, and attended by a number of Dublin Core participants. While hardly a satisfactory long-term solution, this convention provided the means for launching the critical early projects using Dublin Core metadata.

META Tag Attributes in HTML 4.0

Two of the Canberra Qualifiers from the fourth workshop (Language and Scheme) were formalized as META tag attributes in the HTML 4.0 specification (ratified by the W3C membership in December of 1997) [HTML4.0]. These attributes greatly improve the support for more elaborate HTML-embedded metadata, putting in place another important piece of the metadata infrastructure for the Web.

ISO 8601 Date Profile

Date encoding is an important topic in most software systems (and a particularly troublesome one, on the eve of the changing millenium). The ISO 8601 date encoding standard is an important, if complex, specification for promoting consistent date encoding. As part of the Dublin Core initiative, a profile, or subset, of this standard has been developed to simplify parsing of dates for Dublin Core metadata [ISO-8601-PROFILE]. This profile has been adopted by the W3C as the recommended encoding format for all dates in HTML attributes.

RDF Developments

Deploying arbitrary packages of metadata in the Web environment requires a unifying architecture to accommodate the diversity of semantics and structure needed by various communities. The Resource Description Framework [RDF], is a W3C initiative to provide this architecture. RDF has been influenced substantially by the Warwick Framework, a major deliverable of the Warwick Metadata Workshop [DC-2]. The Dublin Core initiative and the RDF working groups share a number of members, and the co-evolution of these two important efforts has added to the progress of each.

Multilingual Dublin Core

The Dublin Core initiative has attracted widespread international interest. There are translations of Dublin Core documentation in German, French, Portuguese, Danish, Norwegian, Finnish, Swedish, Thai, and Japanese, and there are others being planned. There is a working group on multilingual metadata that is working on internationalization issues of the Dublin Core in order promote wider acceptance.

Outcomes of the Helsinki Metadata Workshop (and Beyond)

The Finnish Finish

The most immediate business of the Helsinki meeting [DC-5] was putting to rest the 15 unqualified elements of the Dublin Core. Most of the elements have found broad and uncontroversial acceptance, but several have raised spirited discussions of intent and utility. Discussions of these elements at the meeting and subsequently on the meta2 [META2] mailing list have converged, and the definitions have been sharpened. These definitions are reflected in the reference description of the elements [DC-ELEM] and in an internet draft soon to be released [ID1].

The major areas of contention in Helsinki concerned three elements:

Date

The Date element has been problematic from the outset of the Dublin Core initiative. There are many dates that might be significant in the life cycle of a resource, and the precedence of one over others is more a function of a particular use of the metadata than an extant property of the resource itself. After much debate, it was concluded that the original sense of the Date element would be left as is: a date associated with the creation or availability of the resource. Though many find this unsatisfactorily ambiguous, there is consolation in the elaboration of a series of Date types that will be recommended for use in qualified Dublin Core applications.

A working group draft proposal of Date refinements is linked to the Dublin Core homepage [DATE]. This report identifies a number of additional date types that have been judged to be important for various metadata applications. This report has yet to be validated by broad consensus or by extensive experience, but it serves as a starting point from which a coherent enumerated list of Date types might be built.

Coverage

The Coverage element has also been problematic from the beginning of Dublin Core discussions. The inclusion of the element was subject to hot debate beginning with the first workshop, and while its proponents held sway, there has never been clear direction concerning its deployment. In this case, the evolution of the Web itself provides us with direction. The rapid growth of the Web as a source of local information for cultural and commercial purposes is evidenced by studies that suggest that the targets of most Web-based information seeking are local resources. The growing consensus for the use of the Coverage element is to support search for spatially-referenced resources. This might encompass resources that range from geographically-referenced data to data sets of astronomical measurements, rendered as images.

The use of Coverage to identify a period of time is also supported, but it is expected that this will comprise a small part of its use in unqualified Dublin Core. Additionally, there are more elaborate schemes that will be put in place for the Coverage element in scheme-qualified applications in the future. One can envision a mail code scheme, a human genome scheme, or an astrophysical scheme, each tailored to different community needs.

Relation and the 1:1 Principle

The complexity of relationships among related works resists coherent explication, even in traditional bibliography. The electronic realm, populated as it is with many more possibilities for variants and renditions, exacerbates the problem.

It is particularly problematic to provide coherent metadata for resources that are derived from other resources, and which may have metadata of their own. This is a common requirement for libraries and museums. The topic was highlighted in a metadata meeting held at the Research Libraries Group in July [RLG-SUMMIT], and became a central point of discussion in Helsinki. The discussion resulted in consensus concerning what came to be known as the 1:1 principle -- each resource should have a discrete metadata description, and each metadata description should include elements relating to a single resource. It is desirable to be able to link these descriptions in a coherent and consistent manner.

The Relation element provides a logical means for such linkage. The use of Relation to link both metadata and resources in this way has implications for the use of other elements as well, and these side effects have not yet been fully explored or tested in real world deployments.

Specification of a relationship necessarily involves three components. The first is the identity of the base resource itself (since the base resource is identified in the Identifier element of the metadata, it need not appear in the Relation element itself). The second component is an identifier of the target resource. Finally, there must be a named relation that links the two.

Unqualified Dublin Core affords no clearly defined syntactic method for specifying a named relation and a target resource identifier. Free text can always be used to specify a relationship that is easily understood by a human:

"This document is a French translation of the Dublin Core Element Set Reference Description."

One could, as well, specify a named relation and an identifier (a URL, for example):

"IsBasedOn, http://purl.org/metadata/dublin_core_elements"

The first case is human readable and interpretable, but is of no use to a machine trying to parse an unambiguous relationship between resources. The second case is also expressed as free text, but requires implicit information (from what list is the named relation selected?). It also requires that a convention for the structure of the specification be widely understood (for example, what separators are legal?).

Qualified Dublin Core metadata requires a slightly richer syntax, but supports structured elements and the specification of schemes so that the descriptive power of the elements can be enhanced, whether the metadata is intended for humans or machine processing.

The Relation Working Group Draft Report [RELATION] identified a number of relation types that should encompass many common resource relationships. Applications using the Relation element should use these relation types unless there are compelling reasons to develop others. As scheme-qualified metadata proliferates, it can be expected that communities will add to this list or provide schemes of relationships tailored to their particular disciplines. (See [DC-4] for a discussion of scheme qualifiers.)

Sub-elements

As work on unqualified Dublin Core concludes, there is substantial momentum gathering behind the specification of qualifiers. This is partly in recognition of additional sub-elements that have already appeared in many deployment projects (and the desire to formalize them), and partly in recognition of the need for substructure to support scheme qualifiers that are expected to provide the means for improving the precision of the Dublin Core.

A Sub-element Working Group has formed around defining sub-elements expected (based on early implementation projects) to be common in Dublin Core applications, and draft reports of these groups are available [SUB].

These reports are provisional, and can be expected to change substantively as the work of the Dublin Core data model matures, but the reports provide a feel for the additional functionality thought by many to be essential for effective applications.

A Formal Data Model for the Dublin Core

The co-evolution of the Dublin Core element set and the RDF work proceeding under the auspices of the W3C has benefited both initiatives. The Dublin Core has provided a semantic focus for RDF work, and in turn, RDF work has clarified the importance of a coherent underlying data model for Dublin Core metadata. While not completed at this time, work on the underlying formal model of Dublin Core metadata promises to clarify how best to embed Dublin Core elements (especially the more complex, scheme qualified versions) in a rational syntax that will survive the too-rapid change that challenges all Web-based applications.

Z39.50

At the recent Z39.50 Implementers Group (ZIG) meeting it was agreed to insert the 15 Dublin Core elements into the list of Bib-1 Use attributes. This means that Version 2 and Version 3 Z39.50 clients will be able to specify searches on those Dublin Core elements. In addition, a proposal has been made for how Version 3 Z39.50 clients will be able to use the Dublin Core qualifiers and schemes in queries. This proposal is awaiting consensus in the Dublin Core community on qualifiers and schemes and consensus in the ZIG on the new attribute set architecture [Z39].

Standardization

The Dublin Core has achieved wide international recognition as the primary candidate for interdisciplinary resource description. The diversity of the deployment projects highlights a broad need for what the Dublin Core offers, but to move from these projects to broader adoption requires formal standardization, and the Dublin Core community has begun to follow this path.

The unqualified version of the Dublin Core is the first standardization target, having emerged from nearly three years of vigorous debate among many stakeholders. It is a discrete subset of the initiative that is both useful and stable, and there is sufficient deployment experience to justify confidence in its present form.

The first manifestation of formalizing the Dublin Core is a series of Internet Engineering Task Force (IETF) Internet Drafts, which are working documents with expiration times of six months. The topics of these drafts are as follows:

1. Dublin Core Metadata for Simple Resource Discovery [ID1]

An introduction to the Dublin Core and a definition of the 15-element Dublin Core element set without qualifiers.

2. Encoding Dublin Core Metadata in HTML

A formal description of the conventions for embedding unqualified Dublin Core metadata in an HTML file.

3. Qualified Dublin Core Metadata for Simple Resource Discovery

The principles of element qualification and the semantics of Dublin Core metadata when expressed with a recommended qualifier set.

4. Encoding Qualified Dublin Core Metadata in HTML

A formal description of the convention for embedding qualified Dublin Core metadata in an HTML file.

5. Dublin Core on the Web: RDF Compliance and DC Extensions

A formal description for encoding Dublin Core metadata with qualifiers in RDF (Resource Description Framework) compliant metadata, and how to extend the core element set.

As the contents of these documents are ratified, they will be formalized in IETF Request For Comments (RFC) documents, which have formal standing and longevity.

In parallel, early discussions concerning NISO standardization have commenced, and are expected to lead to NISO standardization of unqualified Dublin Core within the coming year. If this proceeds according to plan, then international standardization would be a probable future step.

There remain many complex dimensions to the development of qualified Dublin Core. There are active working groups and deployment projects experimenting with more sophisticated deployments than are possible in unqualified Dublin Core. As these efforts mature and bear fruit, they will be brought to the attention of the broader community and considered for standardization as well.

Summary of Current Status

The semantics for unqualified Dublin Core were finished in Helsinki. The Finnish Finish will form the basis of the first formal standardization of the Dublin Core, and should provide confidence for wider implementation.

Unqualified Dublin Core uses the fifteen elements (or a subset of them) without sub-elements, named schemes, or other qualifiers. The semantics of unqualified Dublin Core are stable, though certainly as deployment accelerates, standards of best practice will evolve.

Deploying Qualified Dublin Core will continue to be an experimental endeavor for some time to come. There are many unanswered questions concerning how best to cope with issues of extensions, how to invoke scheme and sub-element qualifiers, and where to compromise between interoperability on the one hand and greater descriptive power on the other. Solutions to these problems will emerge from further work on underlying models and from the practical experience of deployment projects with real-world problems to solve.

DC-5 occasioned substantial progress on sub-elements and schemes to support the richer semantics desired of qualified Dublin Core metadata. Working groups will continue to address these problems, informed by the many implementation projects that have already pioneered work in these uncharted realms.

Progress on RDF syntax was embraced in Helsinki as a promising development that will support a rich architecture for metadata of all types. Since Helsinki, further progress has been made on the RDF front: the latest draft specification was made available in early February [RDF-1]. As this important component of Web infrastructure matures, it will complete the spectrum of solutions available for deploying metadata on the Web. The embedding of metadata in HTML 2.0 and the richer metadata support of HTML 4.0 will continue to be viable and useful. The RDF model, however, will support not only embedded metadata, but also more sophisticated metadata models, which are potentially of far greater importance.

The evolution of RDF will continue to stimulate the development of an underlying data model for the Dublin Core. Formalizing the data model will contribute to the solutions of many of the outstanding problems currently being addressed by the Dublin Core community, including the implementation of the 1:1 principle, coherent expression of substructure (schemes and sub-elements), and the registration of schemes and sub-elements.

Helsinki helped resolve many problems and awakened awareness to many others. As the technological infrastructure matures, and tools are developed to help create and manage metadata, it is important not to lose sight of the fact that the hardest challenges we will face are not technological. Key to the success of the initiative is promulgation of a coherent vision that will shape standards of best practice and integration of legacy systems with new environments.

Global networking has extended our reach, but less so, our vision. The Dublin Core initiative has laid a foundation for global, interdisciplinary resource discovery that promises to improve that vision. To the extent that it succeeds, credit is due to the many people who have struggled through frustration, disagreement, impasse, and simply learning one another's language to find a common ground. It is a job well begun, with much yet to be done.

Acknowledgements

The Helsinki Metadata Workshop was organized and co-sponsored by The National Library of Finland, OCLC, and the Coalition for Networked Information (CNI).

Travel support for many attendees was provided by The National Science Foundation, OCLC, and CNI, as well as by the many home institutions of attendees.

In a collegial effort such as this, it is dangerous to thank anyone by name, for this report reflects the hard-won intellectual effort of many people indeed. The authors would be remiss, however, in not acknowledging the particular efforts of four of our colleagues whose exceptional diligence improved this work greatly. Many thanks to Misha Wolf, David Bearman, Simon Cox, and John Kunze.

Further Reading

The Dublin Core Homepage
The definitive source of information on historical and current developments concerning the Dublin Core and related metadata initiatives.
<http://purl.org/metadata/dublin_core >.

References

[AGLS] Australian Government Locator Service Implementation Plan: A Report by the Australian Government Locator Service Working Group (AGLS WG). December, 1997.
<http://www.aa.gov.au/AA_WWW/AGLSfinal.html>.

[DATE] Helsinki Working Group Report on Dublin Core Date Sub-elements.
<http://purl.oclc.org/metadata/reports/date>.

[DC-1] Metadata: the Foundations for Resource Description. Stuart Weibel. D-Lib Magazine, July 1995.
<http://www.dlib.org/dlib/July95/07weibel.html>.

[DC-2] The Warwick Metadata Workshop: A Framework for the Deployment of Resource Description. Lorcan Dempsey and Stuart L. Weibel. D-Lib Magazine, July 1996.
<http://www.dlib.org/dlib/july96/07weibel.html >.

[DC-3] Image Description on the Internet: A Summary of the CNI/OCLC Image Metadata Workshop. Stuart Weibel and Eric Miller. D-Lib Magazine, January 1997.
<http://www.dlib.org/dlib/january97/oclc/01weibel.html>.

[DC-4] The 4th Dublin Core Metadata Workshop Report: DC-4. Stuart Weibel, Renato Iannella, and Warwick Cathro. D-Lib Magazine, June, 1997.
<http://www.dlib.org/dlib/june97/metadata/06weibel.html>.

[DC-5] The 5th Dublin Core Metadata Workshop.
<http://linnea.helsinki.fi/meta/DC5.html>.

[DC-ELEM] Reference Description of the Dublin Core Elements.
<http://purl.org/metadata/dublin_core/elements>.

[DC-PROJ] Projects Using Dublin Core.
<http://purl.oclc.org/metadata/dublin_core/projects.html>.

[DEN] Electronic Communication from Leif Andresen, Library Advisory Officer, Danish National Library Authority (lea@bs.dk), January, 1998.

[HTML4.0] HTML 4.0 Specification (December, 1997).
<http://www.w3.org/TR/REC-html40>.

[HTML-META] A Proposed Convention for Embedding Metadata in HTML.
A position paper from the May, 1996 W3C Workshop on Distributed Indexing and Searching.
<http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/S6Group2.html>.

[ID1] Dublin Core Metadata for Simple Resource Discovery.
<
ftp://ftp.ietf.org/internet-drafts/draft-kunze-dc-02.txt>.

[IMS] Instructional Management System Project Homepage.
<http://www.imsproject.org/>.

[ISO-8601-PROFILE] Date and Time Formats. Misha Wolf and Charles Wicksteed. Submitted to W3C 15 September 1997.
<http://www.w3.org/TR/NOTE-datetime-970915>.

[MCF] The Meta Content Framework Using XML. R.V. Guha and Tim Bray. June 6, 1997.
<http://www.w3.org/TR/NOTE-MCF-XML>.

[META2] The META2 Mailing List, meta2@mrrl.lut.ac.uk
The primary mailing list forum for the discussion of Dublin Core issues.
To subscribe, send a message to majordomo@mrrl.lut.ac.uk with a body of:
subscribe meta2 your-name-here.

[NISO-DOI] Descriptive Metadata Task Group: Report to the NISO DOI Workshop, Washington, DC. February 6, 1998.

[RDF] W3C Resource Description Framework Home Page.
<http://www.w3.org/RDF>.

[RDF-1] Resource Description Framework (RDF) Model and Syntax.
<http://www.w3.org/RDF/Group/WD-rdf-syntax>.

[RELATION] Helsinki Relation Working Group Draft Report.
<http://purl.oclc.org/metadata/reports/relation>.

[RLG-SUMMIT] Metadata Summit: Meeting Report. Willy Cromwell-Kessler and Ricky Erway. July, 1997.
<http://www.rlg.org/meta9707.html>.

[SUB] Helsinki Sub-element Working Group Draft Report.
< http://purl.oclc.org/metadata/reports/subelement>.

[W3C] The World Wide Web Consortium Homepage.
<http://www.w3.org>.

[WF] The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata. Carl Lagoze, Clifford A. Lynch and Ron Daniel, Jr. D-Lib Magazine, July 1996.
<http://www.dlib.org/dlib/july96/lagoze/07lagoze.html>.

[Z39] Dublin Core and Z39.50. Ralph LeVan, OCLC Office of Research and Special Projects. Draft version 1.2, February, 1998. <http://cypress.dev.oclc.org:12345/~rrl/docs/dublincoreandz3950.html>.

Copyright © 1998 Stuart Weibel, Jaha Hakala

Top | Magazine
Search | Author Index | Title Index | Monthly Issues
Previous Story | Next Story
Comments | E-mail the Editor

hdl:cnri.dlib/february98-weibel