D-Lib Magazine
|
|
Dagobert Soergel |
(This Opinion piece presents the opinions of the author. It does not necessarily reflect the views of D-Lib Magazine, its publisher, the Corporation for National Research Initiatives, or its sponsor.) |
Digital library (DL) research and development needs a framework that can be used as a perspective on existing research and practice and, more importantly, as a structured vision for the development of new ideas. As distinct from the DELOS brainstorming report [1], which offers its own agenda for the next phase of DL research (and a somewhat ad-hoc roster for EU-NSF working groups), the framework offered here is based on a very broad view of digital libraries that takes full advantage of the possibilities offered by the integration of computer and telecommunication technology. When engine-driven vehicles were first introduced, they were built in the shape of a horse-drawn carriage and indeed were called "horseless carriages"; it took some time to take full advantage of the new technology and engineer the modern automobile. Much of DL practice is still at the stage of the "horseless carriage"; we must move on to the modern automobile. No claim is made that all or even most of the ideas in this commentary are new; there are many forward-looking leaders in the DL community; indeed, many of the ideas come from or were inspired by project presentations at the EU-NSF DL all projects meeting in March 2002 in Rome [2]. It is hoped that bringing these ideas together in a systematic framework will lead to changes in how DLs are viewed and implemented. The framework consists of three overarching guiding principles and eleven specific themes and areas of research and development. Guiding principles
Themes for DL research and developmentTheme 1. DLs must integrate access to materials with access to tools to process these materials (DL = materials + tools).A digital library should provide access to materials and objects and to the tools needed to process and present these materials in ways that serve the user's ultimate purpose. A whole host of tools are needed in this area, from NLP techniques (including summarization, information extraction, translation, and automatic speech recognition), to statistical and scientific computation and modeling, to graphical rendering and visualization. To give just a few examples: The Digimorph DL [4] provides CAT scan data from biological specimens and the tools to manipulate these data in many ways to separate out bone structures and organs or to show 3-D images that can be rotated. Perseus [5] provides access to maps and pictures of historic London and a tool that lets the user take a virtual 3-D walk through the streets of the historic city. ARION [6] provides access to oceanographic data and to algorithms to process these data, allowing the user to set up a total process by specifying, for each step, the data to be used (data resulting from the previous step and/or data retrieved from the DL) and the algorithm(s) to be applied; the system then executes the process, fetching data and algorithms from the DL as needed. CHLT [7] is developing an integrated reading environment to assist students and humanities scholars with reading, understanding, and interpreting classical texts. Theme 2. DLs should support individual and community information spaces.A DL should support users who work with materials and create their own individual or community information spaces through a process of selection, annotation, contribution, and collaboration. Annotation includes incorporating new structures; for example, adding links, preferably anchored at both ends, between existing objects. Users should be able to contribute new materials, such as the papers they are working on or images they scan from their own collections. User annotations and contributions can be private or public and sharedin a work group, on a company intranet, or on the public Webfor collaboration. Digital libraries become collaboratories or shared information spaces, a medium that scholarly or practice communities can use to store their information and to communicate with each other, adding to the store in the process (the third of the guiding principles). Through keeping detailed search and work histories, the system can capture contributions, relieving the user from having to do extra work [8]. For example, if the user pastes a document passage into his or her own work as a quotation (such as a lawyer quoting from a legal case) or includes a part of an image as an illustration, the system should establish the proper two-way link in the background. Some of these capabilities have been implemented in CYCLADES [9]. Developing a tool set for annotation and "information space development" requires careful task analysis; taking a careful look at the practices in humanities and legal scholarship would provide a useful starting point. Both Theme 1 and Theme 2 address the second guiding principle, creating new ways for intellectual work through seamless integration of information access with information use and information application / users' work. Theme 3. Digital libraries need semantic structure.To convey meaning to users, especially students, to support search and retrieval, to provide knowledge-based support in the user interface, and to support agents that perform work for the user, a DL needs semantic structure, an ontology both in the broad sense of a conceptual schema for a domain (which includes metadata standards) and in the more narrow sense of a classification of subjects and values of other entity types. This is a recurring, if not sufficiently heeded, theme in literature, with emphasis on the need for harmonization (and, to the extent feasible, standardization) across disciplines, languages, cultures, and time (for historical materials). Semantic structures are expensive to create, so reuse, re-purposing, and adaptation of existing schemes is important, along with computer-assisted methods for aligning different schemes; automatic discovery of terms, concepts, definitions, and relationships from text and from user interactions; and collaborative development and maintenance of ontologies. (For a general reference and an example, see [10].) Semantic structure is an area of prime importance; it needs its own EU-NSF Working Group. Theme 4. DLs need linked data structures for powerful navigation and search.This is actually a special case of Theme 3, but it is important enough to discuss as a separate theme. Knowledge is richly interlinked, and computers are extremely good at following links and computing on links; so let's seize the opportunity. This idea is best illustrated by examples:
Theme 5. DLs should support powerful search that combines information across databases.We need to move beyond simple controlled-vocabulary and free-text retrieval in single databases to retrieval in the complex interrelated structures described in Theme 4, working across databases. This includes techniques such as:
In addition to more sophisticated retrieval algorithms, this involves interoperability and distributed access to heterogeneous systems, both on a technical and on an intellectual level. While a lot of work is being done in this area, there are still few widespread operational solutions; we need more efforts that actually build on each other. Problems include:
Finally, this theme encompasses efforts now pursued under the heading "Semantic Web": using statistical and artificial intelligence techniques to derive from the results found a final answer to the user's question or to present data gleaned from the results in a form adapted to the user's background and purpose (see also Theme 1). Theme 6. DL interfaces should guide users through complex tasks.User interfaces are very important in facilitating user interaction with the complex functions described above and letting userswith some trainingtake advantage of new methods of intellectual work. Perseus, cited often in this commentary for its exemplary data structures and tools, still has a poor interface that stands in the way of full exploitation of its riches. Other issues include customization / personalization and methods for deriving user profiles automatically. This area also deserves its own EU-NSF working group. Theme 7. The DL field should provide ready-made tools for building and using semantically rich digital libraries.The Web has dramatically lowered the threshold for making materials accessible. Now we need to lower the threshold for creating or converting materials and presenting them in meaningfully structured collectionswe need a sophisticated digital library toolkit for the masses. Organizations with useful collections, or user communities that want to create a collective store of materials of interest and a platform for collaboration, need affordable and easy-to-use technology. Industry has createdor will createDL systems for publishers and others who can afford them, but the research community should produce high-quality freeware. An example of this is the Greenstone DL package [14]. Greenstone concentrates, so far, on traditional DL functions and does not yet include annotation or collaboration or the other functions discussed above. Ideally, we would have a toolbox to which different people could contribute programs, LINUX-style, including tools for functions such as:
Theme 8. Innovative DL design should be informed by studies of user requirements and user behavior.User requirements analysis methodology is well understood, but actual task-based user requirements studies from the point of view of developing new methods of intellectual work are important both for policy and for system design at all levels. User behavior studies are important for two purposes: (1) improving system design and usability and (2) determining needs for user education. (Remember that a DL and its users are two components of one overall system. Both components may require changes for best overall performance. If DLs are merely "user-centric", they may never take the user to the future place of improved methods of intellectual work.) Theme 9. DL evaluation needs to consider new functionality.Evaluation frameworks for DLs (as well as for other information systems) must be adapted to take into account the many new functions and the changed environment of DLs. Such a framework is the basis for devising an evaluation strategy that tests components and functions. Some of these will be amenable to quantitative measurement; others will require qualitative approaches. Conducting evaluation studies with real users is important; relying solely on TREC-style evaluations can be very misleading. Some of these issues have been addressed in a DELOS workshop [19]. Theme 10. Legal/organizational issues of information access and rights management need to be addressed using new technology.Access to some or all information may be restricted to certain users, particularly if we think of individual and group information spaces, so good computer security is needed. But even if the information is, in principle, universally accessible, the information producers may want to be compensated, so rights management is a huge issue; the trick is to balance fair compensation for intellectual and editorial contributions with ease and universality of access. Another issue in this area is protection of human subject information (which may bein user profiles and use histories as well as in the content of a DL, such as interviews in an oral history collection, survey data, or questions and answers in a medical ask-the-expert site). In many ways, the expanded DL functionality envisioned here can come to full fruition only if the security and rights management issues are resolved. Luckily, while computer access has created many new problems in access control, computers can be used to solve these problems; for example, it is now feasible to collect micro-payments from users and aggregate payments to bill information users and pay information producers. Theme 11. DLs need sustainable business models.Intellectual and technical excellence by themselves do not guarantee the success of a DL; the DL must have a business model for creating the income necessary for its operation. Most paper libraries are funded as public utilities or infrastructure components in organizations. The business model in this case is to make a convincing case for the library's usefulness to the funding organizations. This model applies to many DLs as well, particularly to DLs that grew out of or are extensions of paper libraries. But DLs also grow out of publishing houses, and here the business model is one of selling information access. Furthermore, computerized tracking of use and services allows for a charge-back modelsay within a companypresumably supporting only services that users are willing to pay for, at least in "funny money". (Whether this is the best option for the company or actually leads to harmful under-use of information is another question.) With the "value adding" expansion of functionality suggested here, there are also many opportunities to create DLs of direct value to commercial organizations; such DLs can be sustained (and even made profitable) from user fees. On the other hand, a DL in the humanities must likely rely on public or grant support unless it is used in courses as a textbook replacement and could thus derive income from student purchases. So there is no one-size-fits-all business model, but rather a variety of business models mirroring the variety of information providers (such as authors and publishers) and intermediaries (such as libraries) that get into the digital library business. ConclusionTo advance digital libraries to their full potential, a broad-based digital library research and development framework is needed both to evaluate and integrate existing research and practice and to provide a structured vision for what digital libraries can be. This commentary presented such a framework by integrating ideas from many sources. Because all the elements of this framework interact and depend on each other, digital library research must address not some but all of the eleven themes outlined here. Notes and References[1] DELOS brainstorming report (San Cassiano, Alta Badia, Italy, June 2001, ERCIM-02-W02, <http://delos-noe.iei.pi.cnr.it/activities/researchforum/Brainstorming/brainstorming-report.pdf>. [2] EU-NSF DL all projects meeting in March 2002 in Rome, <http://www.ercim.org/publication/Ercim_News/enw50/thanos.html>. [3] See, for example, ERCIM News, No.48, January 2002, Special Theme: e-Government, <http://www.ercim.org/publication/Ercim_News/enw48/>. [4] The Digimorph Digital Library was established by Tim Rowe at the University of Texas at Austin. See <http://digimorph.org/aboutdigimorph.phtml>. [5] See the Perseus Digital Library (Gregory Crane, Tufts University) at <http://www.perseus.tufts.edu/PR/vr.ann.html>. [6] ARION, Forth, <http://dlforum.external.forth.gr:8080>, and <http://dlforum.external.forth.gr:8080/papers/ARION-paper_v52.pdf>. [7] CHLT (Cultural Heritage Language Technologies) Jeffrey Rydberg-Cox, University of Missouri, Kansas City, <http://www.chlt.org/>. [8] Komlodi, Anita Hajnalka; Soergel, Dagobert. Attorneys interacting with legal information systems: tools for mental model building and task integration. ASIST 2002, p. 152-163, <http://www.research.umbc.edu/~komlodi/papers/ komlodi_soergel_asis2002.pdf>. See also <http://www.research.umbc.edu/~komlodi/papers/akomlodi_dissertation.pdf>. [9] See CYCLADES, IEI-CNR, at <http://www.ercim.org/cyclades/>. [10] Soergel, Dagobert. SemWeb. An environment for integrated access to distributed ontological and lexical knowledge bases and their collaborative development and maintenance. A proposal <http://www.clis.umd.edu/faculty/soergel/soergelSEMMULS7.html>. See also, Atlas for Science Literacy <http://www.project2061.org/tools/atlas/default.htm>. [11] DFG-Projekt "DWB auf CD-ROM und im Internet", <http://www.DWB.uni-trier.de>. [12] COLLATE, FhG IPSI, <http://collate.de>. [13] Kopak, Richard W. Functional link typing in hypertext. ACM Computing Surveys, December 1999; 31(4es). Available from <http://www.acm.org/dl>. See also, <http://www.cs.brown.edu/memex/ACM_HypertextTestbed/papers/41.html>. For a brief discussion of implementation, see <http://www.w3.org/2000/02/rdf-xlink>. [14] Greenstone DL package, <http://www.greenstone.org/english/home.html>. [15] META-E, University of Innsbruck, <http://heds.herts.ac.uk/conf2002/heds2002_metae.pdf>. [16] Droettboom, M., K. MacMillan, I. Fujinaga, G. S. Choudhury, T. DiLauro, M. Patton, and T. Anderson. 2002. Using the Gamera framework for the recognition of cultural heritage materials. Joint Conference on Digital Libraries. JCDL 2002 Proceedings p. 11-17. Available from <http://www.acm.org/dl> and from <http://dkc.jhu.edu/gamera/papers/>. [17] The Multivalent Browser; A Platform for New Ideas, <http://www.cs.berkeley.edu/~phelps/Multivalent>. [18] CiteSeer, <http://www.neci.nec.com/homepages/lawrence/citeseer.html>. [19] DELOS Workshop on Evaluation of digital libraries: Testbeds, measurements, and metrics, <http://www.sztaki.hu/conferences/deval/>.
Copyright 2002 Dagobert Soergel |
|
|
|
Top | Contents | |
| |
D-Lib Magazine Access Terms and Conditions DOI: 10.1045/december2002-soergel
|