D-Lib Magazine
|
|
Stuart L. Weibel
|
In June of this year, I performed my final official duties as part of the Dublin Core Metadata Initiative management team. It is a happy irony to affix a seal on that service in this journal, as both D-Lib Magazine and the Dublin Core celebrate their tenth anniversaries. This essay is a personal reflection on some of the achievements and lessons of that decade. The OCLC-NCSA Metadata Workshop took place in March of 1995, and as we tried to understand what it meant and who would care, D-Lib magazine came into being and offered a natural venue for sharing our work [16]. I recall a certain skepticism when Bill Arms said "We want D-Lib to be the first place people look for the latest developments in digital library research." These were the early days in the evolution of electronic publishing, and the goal was ambitious. By any measure, a decade of high-quality electronic publishing is an auspicious accomplishment, and D-Lib (and its host, CNRI) deserve congratulations for having achieved their goal. I am grateful to have been a contributor. That first DC workshop led to further workshops, a community, a variety of standards in several countries, an ISO standard, a conference series, and an international consortium. Looking back on this evolution is both satisfying and wistful. While I am pleased that the achievements are substantial, the unmet challenges also provide a rich till in which to cultivate insights on the development of digital infrastructure. The AchievementsWhen we started down the metadata garden path, the term itself was new to most. The known Web was less than a million pages, people tried to bribe their way into sold-out Web conferences, and the term 'search engine' was as yet unfamiliar outside of research labs. The OCLC-NCSA Metadata Workshop brought practitioners and theoreticians together to identify approaches to improve discovery. In two and a half days, an eclectic Gang of 52 (we affectionately described ourselves as 'geeks, freaks, and people with sensible shoes') brought forward a core element set upon which many resource description efforts have since been based. The goal was simple, modular, extensible metadata a starting place for more elaborate description schemes. From the thirteen original elements we grew to a core of fifteen, and later elaborated the means for refining those rough categories. In recent years much work has been done on the modular and extensible aspects, as application profiles have emerged to bring together terms from separate vocabularies [9]. A Consensus CommunityThe workshop series coalesced as a community of people from many countries and many domains, drawn by the appeal of a simple metadata standard. Openness was the Prime Directive, and early progress was often marked by the contentious debate of consensus building. But our belief that value would emerge from many voices informed our deliberations, and still does. Not without difficulty: in one early meeting, participants spent an hour of scarce plenary time talking about Type before realizing that the librarians and the computer scientists had been talking about completely different concepts. Crossing borders is often difficult. This open, inclusive approach to problem solving helped the Dublin Core community to frame the metadata conversation for the past decade. The Dublin Core brand has been for some years the first link returned for the Google search term "metadata", and for a time, it outranked all other results for the search "Dublin" (as of this writing, it is #6). With only moderate irony, we might say "I feel lucky!" ProcessAs a workshop series evolved into a set of standards and a community, the need for rules and governance evolved as well. DCMI developed a process for evaluating proposed changes and bringing them into conformance with the overall standard [5]. The DCMI Usage Board is comprised of knowledgeable, experienced metadata experts from five countries who exercise editorial guidance over the evolution of DCMI terms and their conformance with the DCMI Abstract Model [13]. This model itself is among the most important of the achievements of the Initiative, representing as it does the convergence of theory and practice over a decade of vigorous debate and practical implementation. It emerged from early intuition and experience, informed by an evolving sense of grammatical structure [2,6] and further refined by a long co-evolution with the W3C's Resource Description Framework (RDF) and the Semantic Web. At a higher level, DCMI has a Board of Trustees [1], who oversee operations and do strategic planning, and an Affiliate Program and governance structure that distributes the cost of the initiative and assures that the needs of stakeholders are accommodated [3]. At the time of this writing, there are four national DCMI Affiliates and several more in discussion. InternationalizationThe global nature of the Web demands commitment to internationalization. The difficulties of achieving system interoperability in multiple languages are immense, and still only partially solved (anyone used IRIs recently?). Nonetheless, DCMI has succeeded in attracting translations of its basic terms in 25 languages and offers a multilingual registry infrastructure of global reach [14]. The venues for the workshops and conferences have been chosen to make the Initiative accessible to people in as many places as possible. Workshops and conferences are held in the Americas, Europe, and Austral-Asia on a rotating basis, and Dublin Core principals have given talks on every continent save Antarctica. This policy of international inclusion has been a philosophic mainstay for the Initiative, attracting long-term participation from around the world. Where we were confusedConfusions and unmet challenges are both interesting and instructive. A few of these are historical curiosities, and interesting mostly as a source of wry humility. Others represent unsolved dilemmas that remain prominent challenges for the metadata world in general. Author-created MetadataThe idea of user-created metadata is seductive. Creating metadata early in the life cycle of an information asset makes sense, and who should know the content better than its creator? Creators also have the incentive of their work being more easily found who wouldn't want to spend an extra few minutes with so much already invested? The answer is that almost nobody will spend the time, and probably the majority of those who do are in the business of creating metadata-spam. Creating good quality metadata is challenging, and users are unlikely to have the knowledge or patience to do it very well, let alone fit it into an appropriate context with related resources. Our expectations to the contrary seem touchingly naïve in retrospect. The challenge of creating cost-effective metadata remains prominent. As Erik Duval pointed out in his DC-2004 keynote, 'Librarians don't scale' [7]. We need automated (or at least, hybrid) means for creating metadata that is both useful and inexpensive. What is metadata for?Another naïve assumption was that metadata would be the primary key to discovery on the Web. While one may quibble about the effectiveness of unstructured search for some purposes, it is the dominant idiom of discovery for Web resources, and may be expected to remain so. What then, is metadata for? There are many answers to this question, though given the high stakes in the search domain, expect these answers to shift and weave for the foreseeable future. Searching the so-called 'dark web' remains a function of gated access, and metadata is a central feature of such access. One might simply say harvest and index. OCLC's exposure of WorldCat assets in search engines such as Google and Yahoo is exemplary of this approach [11]. Indexed metadata terms connect users to the location of the physical assets via holdings records, but it is reasonable to ask... would simple, full-text indexing of these assets be better still? We may argue the fine points today but in the future, we'll know the answer, for the day of digitization is fast upon us. Structured metadata remains important in organizing and managing intellectual assets. The Canadian Government's approach to managing electronic information illustrates this strategy [4]. Metadata becomes the linkage relating content, legislative mandates, reporting requirements, intended audience, and various other management functions. One does not achieve this sort of functionality with unstructured full text. The International Press Telecommunications Council is exploring embedding Dublin Core in their new generation of news standards [17]. No domain is more digitally now than this one. If you want to know the value of structured metadata, look to the requirements and business cases in such communities [10]. Similarly, in the management of intellectual property rights, well-structured data is essential, and as these requirements become ubiquitous, the creation and management of metadata will be central to the story. Metadata for images is a critical use. Association of images with text makes them discoverable. When the asset is a stand-alone image, metadata is the primary avenue by which they can be accessed. Picture Australia is an early and enduring (and widely copied) model in this area, showing how a photo archive can become a primary cultural heritage asset through the addition of systematic search tools and Web accessibility [12]. There is much talk of taxonomies, their strengths, and deficiencies these days and in fact the emergence of 'folksonomies' hints at a sea change in the use of vocabularies to improve organization and discovery [9]. The Dublin Core community has struggled with the role of controlled vocabularies, how to declare and use them, and how important (or impotent?) they might be. The notion that uncontrolled vocabularies community-based, emergent vocabularies might play an important role in aggregation and discovery occasions a certain discomfort for those schooled in formal information management. Whether it is just the latest fad, or an important emerging trend, remains to be seen. A Major Unmet ChallengeEntropy is an arrow. In the absence of constant care and fussing, our greatest successes break down. Failures, however, remain potent without much attention, retaining their power to impede. One of the yet-unsolved problems in the metadata community is the railroad gage dilemma. The first editor of D-Lib, Amy Friedlander, introduced me to the notion of train gages as metaphor for interoperability challenges [8]. Last year I rode that metaphor from Beijing to Ulan Bator, Mongolia. A cursory knowledge of Asian history reminds us that relations between Mongolia and China have been less-than-cordial from time to time, and this history remains manifest at the Gobi border crossing today. In the dark of night, the Beijing train of the Trans-Siberian Railway pulls into a longhouse of clanking and switching as the entire train is raised on hydraulic jacks. Chinese bogeys (wheel carriages) are rolled out, and Mongolian bogeys of a different gage are rolled in. Border guards with comically high hats (and un-comical sidearms) work their way through the train cars in the manner of border guards everywhere. After a couple of hours, the train is rolling through the Gobi anew. It is a fascinating display of technological diplomacy a kind of Maginot line that helps those on both sides of the border sleep better. These images belong to a Bogart movie or a Clancy novel, but their abstraction pervades the metadata arena.
We load our metadata into structures in one domain and when we cross borders we unload it, repackage it, massage it to something slightly different, and suffer a measure of broken semantics in the bargain. We're running on different gages of track, manifested in different data models, slightly divergent semantics, and driven by related, but meandering, often poorly-understood functional requirements. Crosswalks are the hydraulic jacks quieter, but no more efficient than the clanking and grinding in the train longhouse. Metadata standards specify the means to make (mostly) straightforward assertions about resources. Many of these assertions are as simple as attribute-value pairs. Others are more complex, involving dependencies or hierarchies. None are so complicated that they cannot be accommodated within a common formal model. Yet we do not have such a model in place. Why?
The idea of achieving similar consensus across communities with their own legacies of such conflict is daunting in the extreme, though recent discussions on this topic with colleagues in another metadata community remind me that hopefulness and optimism are as much a part of our domain as contention [18]. Collaboration and consensus in the digital environmentThe Web demands an international, multicultural approach to standards and infrastructure. The costs in time and treasure are substantial, and the results are uncertain. Paying for collaboration that spans national boundaries, language barriers, and the often-divergent interests of different domains is a major part of these challenges. Doing this while sustaining forward progress and attracting a suitable mix of contributors, reviewers, implementers, and practitioners, is particularly difficult. A recent presentation by Google's Adam Bosworth, referenced in the Blandiose blog [15], makes for provocative reading for those debating the costs and benefits of heavy-weight versus light-weight standards. The tension between these approaches sharpens designers and practitioners (and especially, entrepreneurs), to the eventual benefit of users. Any standards activity ignores this balancing act at its peril. As we try to foment change and react to it at once, we are like Escher's Hands designing the future as it, in turn, designs us... except that there are often implements other than pencils in those hands. Ever try explaining what you do for a living to your mother? In the Internet standards arena, conveying an appropriate balance of glee, terror, satisfaction, frustration, and pure wonder is no easy task. I just tell her I'm not a real librarian, but I play one on the Internet. It seems enough. AcknowledgementsI wish to acknowledge my personal debt to uncountable colleagues in the Dublin Core community, and my deep sense of gratitude for the opportunity to have played the role I have. The patience, forbearance, and generosity of the support of OCLC management in supporting my efforts and DCMI in general, have been singular and essential. Thomas Baker reviewed and improved this manuscript with several insightful suggestions. Amy Friedlander and Bonnie Wilson, successive editors of D-Lib, have made me look better than I am in these pages for 10 years. Congratulations to them and to all who have helped make this journal (and its authors) what they are. References and Notes
[1] About the Initiative
DCMI Website, accessed June 23, 2005
[2] Baker, Thomas
[3] DCMI Affiliate Program
[4] Committee of Federal Metadata Experts Metadata Action Team,
[5] DCMI Usage Board
[6] DCMI Usage Board
[7] Duval, Erik and Wayne Hodgins
[8] Friedlander, Amy
[9] Mathes, Adam
[10] News Architecture Version 1.0 Metadata Framework Business Requirements
[11] Open Worldcat Program
[12] Picture Australia
Hosted by the National Library of Australia
[13] Powell, Andy; Mikael Nilsson, Ambjörn Naeve, and Pete Johnston.
[14] Wagner, Harry and Stuart Weibel
[15] "Web of Data"
[16] Weibel, Stuart
[17] Wolf, Misha [18] The author has been party to discussions with Erik Duval and Wayne Hodgins of the IEEE LOM effort centered around the possibility of cross-standard data modeling that might promote convergence among various metadata activities. The means and methods for carrying such work forward are presently undetermined. Copyright © 2005 OCLC Online Computer Library Center, Inc. |
|
|
|
Top | Contents | |
| |
D-Lib Magazine Access Terms and Conditions doi:10.1045/july2005-weibel
|