Where Do We Go From Here? The Next Decade for Digital Libraries

Search | Back Issues | Author Index | Title Index | Contents

D-Lib Magazine
July/August 2005

Volume 11 Number 7/8

ISSN 1082-9873

Where Do We Go From Here?

The Next Decade for Digital Libraries

Clifford Lynch
Coalition for Networked Information
<clifford@cni.org>

	The field of digital libraries has always been poorly-defined, a "discipline" of amorphous borders and crossroads, but also of atavistic resonance and unreasonable inspiration. "Digital libraries": this oxymoronic phrase has attracted dreamers and engineers, visionaries and entrepreneurs, a diversity of social scientists, lawyers, scientists and technicians. And even, ironically, librarians – though some would argue that digital libraries have very little to do with libraries as institutions or the practice of librarianship. Others would argue that the issue of the future of libraries as social, cultural and community institutions, along with related questions about the character and treatment of what we have come to call "intellectual property" in our society, form perhaps the most central of the core questions within the discipline of digital libraries – and that these questions are too important to be left to librarians, who should be seen as nothing more than one group among a broad array of stakeholders. It is debatable (in the true sense of the word) whether this very broad tent full of diverse interests and inherent contradictions will continue to be helpful going into the future, and my purpose here is to briefly consider some of the arguments that might be brought to such a debate both in terms of where digital libraries have been and where they may choose to go in future. This issue of D-Lib is intended to be a retrospect on the past decade of digital library efforts. There are many ways to think about this decade, and I'll mention some of them shortly. But before doing that, let us recognize that digital libraries did not simply emerge from nowhere in the mid-1990s. This is a field with an incredibly rich, and, as yet, poorly chronicled pre-history and early history. There is a stream of work and ideas that reaches back to at least the turn of the 20^th century, and includes such thinkers as H.G. Wells and Paul Otlet; later contributors to the pre-history of visions of new, technologically-enabled means of knowledge organization, access and distribution also include Vannevar Bush and J.C.R. Licklider.(Indeed, Licklider's 1965 book Libraries of the Future might well be seen as marking one of the transition points between pre-history and the actual history of digital libraries.) Up till the mid-1990s, as the Internet made the jump from the research and education community into the broad public consciousness, we repeatedly encounter the utopian visions of a hither-to unimaginable cornucopia of knowledge and human creativity accessible worldwide through the network; while such aspirations were at least deflated if not largely shattered by the onslaught of commercial and government interests, the censors, snoops and copyright maximalists, they still run not far under the surface of the public mind, as witnessed by the reactions to Google's recent announcement that it would digitize the contents of several major research libraries. The technical and engineering basis for digital libraries also reaches back several decades, to the 1960s, and includes on-line research and commercial information services, library automation systems, document structuring and manipulation systems, human computer interface work and a wealth of other efforts. Technologies like distributed search (for example, Z39.50) were well established by the late 1980s; it is easy today to forget Kahn and Cerf's seminal integrative paper "The Digital Library Project Volume 1: the World of Knowbots" was written in 1987-1988. Indeed, by the mid-1980s there were systems both in the commercial sector (consider Lexis-Nexis) and the research world (Bruce Schatz's Telesophy, for example) that might reasonably be considered digital libraries at least by some definitions. Very substantial digital library systems were developed prior to the World Wide Web. One way to characterize the period from about 1994-2004 is that it represents the first time that digital library research could really get substantial programmatic funding from the major research funding agencies in the United States (and abroad as well, though that is another story I can't cover here). The U.S. National Science Foundation, in collaboration variously with DARPA, NASA, the National Library of Medicine and the National Institutes of Health, the Library of Congress, the National Endowment for the Humanities, the Institute of Museum and Library Studies (and no doubt other entities that I have offended by omitting here) established two major competitive funding programs – the Digital Libraries Initiative and DLI-2 – through which researchers in higher education (along with partners beyond the higher education world) systematically engaged in the construction and analysis of digital library prototypes and research in both the underlying technologies and social implications surrounding these systems. This funding legitimized digital libraries as a field of research (including providing support for a cadre of graduate students). And it captured the attention of scientists, scholars, educators, political figures, and the general public as these investments were played out against the public discovery of the Internet, the Web, and all things digital and networked. We should be clear that the new element was the programmatic funding and community creation: NSF, NIH and other government agencies certainly invested in systems that we might today retrospectively characterize as early digital libraries at least as early as the 1960s (think of Dialog and BRS and Chemical Abstracts); what they didn't do was systematically create a funding program dealing with digital libraries research in a general way or that strongly encouraged cross-disciplinary collaboration, including collaboration among computer and information scientists, engineers, librarians and social scientists. In 1992, DARPA funded the Computer Science Technical Reports project, which involved five universities and the Corporation for National Research Initiatives; while this was important in terms of digital library technology, it also had some explicit community-building goals, and might be viewed as seeding the larger-scale community building from the full scale digital libraries initiatives that began a few years later. Also, there was a huge amount of digital library activity happening by the 1990s in the commercial, government, higher education and cultural memory sectors, often with a more operational than research orientation, as well as in research projects that weren't part of the NSF-led initiatives but drew funding from other sources. Ironically, particularly during the first half of the decade, one would be much more likely to find actual libraries involved in these other efforts – the lion's share of the NSF funding went to computer science groups, with libraries often being only peripherally involved, if at all. But the singular and really stunning accomplishment of NSF and its collaborator institutions during this period was to successfully form a community, convened physically through "all-hands meetings" and conferences like the Joint ACM/IEEE Conferences on Digital Libraries (and its predecessors and international peers) or the IMLS Web-Wise meetings, and virtually through venues like D-Lib Magazine. Other organizations also helped to advance this community-building work as well; the contributions of the Andrew W. Mellon foundation, and the Coalition for Networked Information (my own organization) were certainly valuable here, for example. Indeed, the investment that NSF led in community-building immediately benefited from all of the other investments in operational digital libraries, in related technologies, and in digital library research mentioned above; NSF and its collaborators wisely chose an inclusive rather than exclusionary posture in engaging the sectors of the digital library world beyond those that they were directly funding. The unprecedentedly diverse multi-disciplinary (and international) community which NSF and its collaborators assembled quickly generated enormous leverage in promoting the advancement, organization and dissemination of knowledge and ideas about digital libraries. The achievement here has been so significant, and so effective, in my view, that it demands careful consideration as strategies are developed in future for advancing other innovative multi-disciplinary scientific and engineering research initiatives. As of 2005, it seems a virtual certainty that substantial programmatic US government funding of digital libraries research in terms of the construction of prototype systems is at an end, at least for the near future. The novelty of constructing digital libraries as a research end in itself has run its course; additionally, government budget contractions and shifts in funding priorities make it difficult to establish any new research initiatives. Exceptions are mostly in areas like defense, intelligence and homeland security, and the digital library community has certainly repositioned projects and refocused research to respond to funding solicitations from these sectors. Certainly relevant enabling digital library technologies can and will compete with other technologies involved in basic information systems and technology and computer science for funding under more general funding programs from NSF and other agencies. The investigation of the role of cultural memory institutions in the digital world and related policy problems in intellectual property never really got much funding to begin with, except in terms of some studies done by the National Research Council such as the Digital Dilemma report; it's hard to structure academic research around these questions, at least within a framework that is comfortable and comprehensible to science and engineering funding agencies. (Individual speculation and deep reflection seem to be a better fit for grants from private foundations, such as the MacArthur "genius" awards). The overall sense of a digital library community, at least for now, continues, supported by modest investments from IMLS, NSF and various other sources. Where might the concepts, technologies, engineering know-how, and even issues developed by this amorphous "digital library" community over the past decade or more find other new and perhaps unexpected roles in the coming years? One obvious place is production systems, and commercial products that can be used to help construct and operate these production systems. The thinking about cyberinfrastructure and e-research in the United States and beyond clearly maps out a place for digital libraries as part of the infrastructure to support research, and maps out a set of research issues in the applications of digital library technology to support various scholarly, scientific and engineering disciplines. There are roles for social scientists as well as technical and disciplinary specialists in exploring these research issues. The move to e-research has also highlighted a related set of efforts that involve what is now being termed "data curation" or even "data science" – the management and curation of large, complex scientific information resources. The digital library community has much to contribute to this work. But a good deal of what will happen here I would characterize as advanced technology deployment in production systems rather than pure research, much like what takes place in the deployment of high performance computer communications networks to support the research and education community. It is work that combines production engineering and research in complex and delicate balances. Beyond the e-research and cyberinfrastructure programs, we see a great deal of investment across the higher education, cultural memory, and government and commercial sectors in systems and services like digital asset management, digital collection creation and management, and institutional repositories. All of these use the technological tools of digital libraries, and many of them draw upon the social tools and insights as well. And search technologies of various kinds, both at the enterprise and internet-wide (e.g. Google, Yahoo, et al) levels also draw heavily on digital library technologies. In a real sense, then, we can view digital libraries as offering a relatively mature set of tools, engineering approaches, and technologies that are now ready to be harnessed in the service of many organizations and many purposes. Much of the further research will occur within the context of those organizations and purposes, but certainly within these contexts there's an enormous amount of research yet to be done, particularly when one includes the curation and preservation issues. More broadly speaking, digital preservation is going to be an enormous issue – a very fundamental societal problem at all levels from the nation-state to the individual. In my view, it's going to attract increasing commercial interest, as well as growing unease and concern from the general public, over the next decade. This is a hard area to do compelling research in: without the digital analogs of physical accelerated ageing test beds, most research is either about tools, about identifying approaches that don't work, or is highly speculative in nature – how do you prove that your approach in fact will preserve data for a thousand years without having to wait that long? (You can prove it will fail in much less time, of course!). Digital libraries have made some contributions to this area, but limited ones. We now have national programs like NDIIPP at the Library of Congress trying to deal with these areas on a more operational basis; the operational programs are deeply bound up with legal, economic and public policy conundrums. My feeling is that we need to fund this area with all the research dollars it can usefully employ – from a variety of sources, including the research agencies and operational programs like NDIIPP – but that the amount that can be usefully expended on real research probably isn't terribly large. Prototypes, pilot projects and operational system and service launches are another matter entirely, and are likely to be much more expensive – but they generally also need long-term funding, or some other kind of economic sustainability framework that comes into operation to support them if they are going to really accomplish much. Note that there's a set of research questions about stewardship more broadly in the digital age; these are related to preservation but go far beyond preservation, and move into cultural, public policy, and ethical questions about how and what we remember and forget, about when and how it is appropriate to invest in ensuring the survival of memory. These question are, in my view, of central importance, but they are not purely (or even, perhaps, predominantly) scientific and engineering questions, and are at best adjacent to the concerns that have characterized digital libraries, even in the context of preservation. Yet we must not lose sight of them, even though it's unclear where the responsibility for engaging them lies other than in academia broadly, largely outside of existing funding frameworks. Finally, there are numerous areas of research related to the historic interests of the digital library community that are at the crossroads of technology and social science and which will demand investment and attention in the coming years; many of these are natural extensions and elaborations of the collaborations initiated by the past decade of digital library research programs. I would hope to see some of them become part of specific research funding initiatives. At the same time, however, recognize that many of them have characteristics that make them unattractive as traditional three to five year funded research programs. Let me mention just a handful of areas here that I find particularly compelling. Personal information management. As more and more of the activities in our lives are captured, represented and stored in digital form, the questions of how we organize, manage, share, and preserve these digital representations will become increasingly crucial. Among the trends lending urgency to this research area are the development of digital medical records (in the broadest sense), e-portfolios in the education environment, the overall shift of communications to email, and the amassing of very large personal collections of digital content (text, images, video, sound recordings, etc.) Long term relationships between humans and information collections and systems. This is related to personal information management, but also considers evolutionary characteristics of behavior, systems that learn, personalization, system to system migration across generations of technologies, and similar questions. This is connected to human-computer interface studies and also to studies of how individuals and groups seek, discover, use and share information, but goes beyond the typical concerns of both to take a very long time horizon perspective. Role of digital libraries, digital collections and other information services in supporting teaching, learning, and human development. The analysis here needs to be done not on a relatively transactional basis (i.e. how can a given system support achievement of a specific curricular goal in seventh grade mathematics) but how information resources and services can be partners over development and learning that spans an entire human lifetime, from early childhood to old age. Active environments for computer supported collaborative work offer the starting point for another research program. These environments are called for, under the term "colaboratories", by the various cyberinfrastructure and e-science programs, but have much more general applicability for collaboration and social interactions. From one perspective, these environments are natural extensions of digital library environments, but at least some sectors of the digital library community have always found active work environments to be an uncomfortable fit with the rather passive tradition of libraries; perhaps here the baggage of "digital libraries" as the disciplinary frame is less than helpful. But there is a rich research agenda that connects literatures and evidence with authoring, analysis and re-use in a much more comprehensive way than we have done to date; this would consider, for example, the interactions between the practices of scholarly authoring and communication on one hand, and on the other, the shifting practices of scholarship that are being recognized and accelerated by investments in e-science and e-research. Perhaps the overarching theme here, and it is one that may point to a major direction for research that follows on the last decade of progress in digital libraries, is connecting and integrating digital libraries with broader individual, group and societal activities, and doing this across meaningful time horizons that recognize digital libraries and related constructs as an integral and permanent part of the evolving information environment. The next decade for digital libraries may well be characterized most profoundly by the transition from technologies and prototypes to the ubiquitous, immersive, and pervasive deployment of digital library technologies and services in the broader information and information technology landscape. Copyright © 2005 Clifford Lynch

	Top \| Contents Search \| Author Index \| Title Index \| Back Issues Previous Article \| Next article Home \| E-mail the Editor

	D-Lib Magazine Access Terms and Conditions doi:10.1045/july2005-lynch