Toward a Global Digital Library: Generalizing US-Korea Collaboration on Digital Libraries

D-Lib Magazine
October 2002

Volume 8 Number 10

ISSN 1082-9873

Toward a Global Digital Library

Generalizing US-Korea Collaboration on Digital Libraries

Edward A. Fox¹, <fox@vt.edu>
Reagan W. Moore², <moore@sdsc.edu>
Ronald L. Larsen³, <rlarsen@mail.sis.pitt.edu>
Sung Hyon Myaeng⁴, <shmyaeng@cs.cnu.ac.kr>
Sung-Hyuk Kim⁵, <ksh@sookmyung.ac.kr>

¹Virginia Tech, Dept. of Computer Science, M/C 0106, Blacksburg, VA 24061 USA
²San Diego Supercomputer Center, La Jolla, CA 92093-0505 USA
³University of Pittsburgh, School of Information Sciences, Pittsburgh, PA 15260 USA
⁴Chungnam National University, Div. of Electrical & Computer Engineering, Taejon, Korea
⁵Sookmyung Women's University, Div. of Information Science, Seoul, Korea

Abstract

In this article, we report on recommendations to remove barriers to worldwide development of digital libraries, drawing upon an August 2000 workshop involving researchers from the US and Korea who met at the San Diego Supercomputer Center. We present a summary table identifying application domains, institutions engaged in those applications, example activities, technical challenges, and possible benefits. Since education has high priority in Korea, an important opportunity that should be explored is involvement of Koreans in the initiative led by the US NSF to develop the National STEM (Science, Technology, Engineering, and Mathematics) education Digital Library, NSDL.

1. Introduction

There are many barriers to the worldwide development of digital libraries (DLs). These barriers are of particular concern in the context of digital library support of collaboration on research and education between pairs of nations with very different languages and cultures, such as the US and Korea. Recommendations to remove such barriers were developed through a workshop involving digital library researchers from the US and Korea who met August 10-11, 2000, at the San Diego Supercomputer Center (see http://fox.cs.vt.edu/UKJWDL). A brief summary of these recommendations includes:

Real / potentially significant collaboration opportunities should be nurtured.

An important opportunity to explore is involvement of Koreans in the US NSF-led National STEM (Science, Technology, Engineering, and Mathematics) education Digital Library, NSDL (http://www.nsdl.org).

Teams should apply to NSF's International Digital Libraries Collaborative Research and Applications Testbeds Program and similar programs in Korea.

Further focused programs might relate to one or more of:
1. Korea Culture and Heritage Digital Libraries;
2. Ontologies and the areas dealing with human languages (e.g., machine translation, cross-language information retrieval, text summarization);
3. Application areas of particular importance, such as health / medical care; Important DL problems, such as architecture (including managing data, information, and knowledge), interoperability, and user-centered design.

Table 1 further establishes a context for this discussion by identifying application domains, institutions engaged in those applications, example activities, technical challenges, and possible benefits. The application domains support communities that provide requirements driving the development of digital library applications. The related institutions are representative groups who need the associated software systems. The examples describe either types of software systems or collections used by the institutions. The technical challenges are specific capabilities needed for a successful digital library. The benefits/impacts are broader results that accrue from the implementation of software to meet the technical challenges.

Table 1. Overview of Digital Library Applications, Benefits, and Challenges (where bold entries may have particular application in Korea)
Application Domain	Related Institutions	Examples / Implementation	Technical Challenges	Benefits / Impacts
Publishing	Publishers, Eprint archives	OAI	Quality control, openness	Aggregation, organization, increase in visibility/accessibility
Education	Schools, colleges, universities	NSDL, NCSTRL	Knowledge management, reusability	Access to data, cyber education
Art, *Culture*	Museum	AMICO, PRDLA	Digitization, describing, cataloging	Global understanding, education
Science	Government, Academia, Commerce, Research institutions	NVO, PDG, SwissProt, UK eScience, EU commission	Data models	Reproducibility, faster reuse, faster advance
(e) Government	Gov. Agencies (all levels)	Census	Intellectual property rights, privacy, multi-national	Accountability, homeland security
(e) Commerce, (e) Industry	Legal institutions	Court cases, patents	Developing standards, packaging	Standardization, *economic development*
History, Heritage	Foundations	American Memory	Content, context, interpretation	Long term view, perspective, documentation, recording, facilitating, interpreting, understanding
Cross-cutting	Library, Archive	Web, personal collections	*Multi-language, preservation, scalability, interoperability, dynamic behavior, workflow, sustainability, ontologies, distributed data,* infrastructure	Reduced cost, increased access, preservation, democratization, leveling, competitiveness, peace

Many of the technical challenges cut across boundaries, impacting many domains, and lead to basic technology important to many institutions. In this sense, the cross-cutting issues can be considered the highest priority challenges for exploration within the context of a US-Korea joint development project. Also noteworthy are the bold entries, which, we believe, may be of particular relevance to the growth of digital libraries in Korea.

The opportunities provided by digital library technology, characterized in Table 1, are pervasive, affecting all organizations that need to manage data and information. Seven different application domains are considered, plus an eighth domain of cross-cutting issues. For each domain, the principal related institutions and current example implementations are identified. An important technical challenge is listed, and important benefits and impacts from the use of digital library technology are shown. The three columns on the left of the matrix represent the types of opportunities. The two columns on the right of the matrix provide a cross section of the challenges and impacts. We note that the technical challenges may be encountered for any of the application domains, while most of the benefits apply to all of the application domains.

There are many ways to interpret the matrix. One can pick an application area and identify the important research areas (technical challenges). Or one can pick an institution that funds the development or application of digital library technology, and identify important motivating benefits. Finally, one can pick an example and decide if comparable technology would benefit similar organizations, such as in the US and Korea.

2 Background: The US-Korea Workshop

The US-Korean workshop was organized to identify joint research and development activities that would enable the development of multi-cultural digital libraries. The emphasis was on software, systems, applications, content collections, community needs, testbeds, and related issues. The challenge was to identify where technology and applications that were unique to Korea could be used to drive the development of new digital library technology. From the perspective of the US National Science Foundation, the workshop presented an opportunity to promote international collaboration, and to increase the general applicability of digital library technology.

The workshop was organized collaboratively between Korea and the United States through a joint program committee. Dr. Edward A. Fox led the US participation on the program committee, and proposed researchers within the US digital library community who were working on relevant technology. Drs. Sung Hyon Myaeng and Sung-Hyuk Kim played a similar role in Korea, with support from the Ministry of Information and Communication and the National Computing Agency. A representative sample of researchers who were dealing with multi-cultural issues, including multi-lingual digital library access, were selected, as were a number already engaged in or considering collaboration. Each researcher who was invited was asked to submit a white paper describing their research and associated technology. The selected researchers then were asked to give a short presentation at the workshop.

Dr. Reagan W. Moore hosted the workshop at the San Diego Supercomputer Center on August 10-11, 2000. Later Dr. Moore played a key role in writing many parts of the report. The agenda for the workshop is posted at the workshop web site, along with papers, presentations, and group reports.

2.1 Motivation and Context

The motivation for the workshop was to identify mutually beneficial applications that would justify either digital library research or the application of digital libraries. The presentations at the workshop were evenly split between researchers from the US and Korea, which provided ample opportunity to present relevant technologies from both communities. In addition, the presentations were grouped into common themes in order to promote interactions among those doing similar research. One of the outcomes from the workshop was a better understanding of the driving motivations for the development of digital library technology. From the Korean community, the importance of cultural heritage and education was strongly emphasized. From the US community, the use of digital libraries to further education and promote publication of data was clearly shown. These synergistic motivations led to the identification of cultural heritage education as a driver for new digital library applications. Work in this area should not only lead to good science, but should have broad and positive impact on US and Korean societies, leading to improved cross-cultural understanding.

2.2 The Regional Focus / Global Implications

Joint US-Korea development efforts may have global implications. Extending systems developed in the US to handle Korean texts can help lead to their being able to work with Chinese and Japanese documents as well. Technology promoting multi-lingual and multi-cultural holdings can be applied across all countries. Though the particular inter-government policies for technology development that emerge may not be appropriate for other joint projects, procedures by which the agreements are established should be applicable to many multi-national collaborations.

2.3 Emerging Collaborative Activities

Digital libraries provide a mechanism to assemble collections that span multiple countries, enabling access to material that would otherwise be inaccessible. An example of such collaboration is the Pacific Rim Digital Library Alliance (PRDLA, http://prl.sdsc.edu). The members of the alliance are from the nations that surround the Pacific Rim, including a Korean university library. Each institution contributes collections, software infrastructure, or hardware support systems to build a distributed digital library. Note that intellectual property rights restrictions limit access to some PRDLA collections (but digital library management of intellectual property rights is an important precursor for establishing an information industry that engages the corporate sector). One of the goals is to assemble the largest collection of Chinese text in the world by integrating access to multiple existing collections, such as:

Bibliography of East Asian Studies, which provides an index to Western language journal articles on East Asian studies.

Beijing SuperStar Digital Library, which provides scanned images of some 80,000 titles of Chinese digital books on many subjects from China.

The electronic version of Wenyuange Sikuquanshu (the Wenyuan Library collection from the Qing Palace), which contains full-text and scanned images of some 3,700 titles. (This collaboration includes a vendor of digital books in Hong Kong.)

The US-Korea workshop pointed to the need to extend these collaborations to include cultural information from Korea. PRDLA shows that the policy and cooperative agreements needed to build a multi-national library can further communication between nations.

Yet PRDLA is not the only context for collaboration. The Networked Digital Library of Theses and Dissertations, which emerged as a result of US government funding beginning in 1996, has stimulated interest in electronic theses and dissertations around the globe, including in Australia, China, Germany, and India. Two government agencies in Korea, KISTI and KERIS, are now engaged in projects that have led to collections already containing tens of thousands of such works.

Opportunities for many additional collaborations between the US and Korea exist. Some will emerge naturally, while others, such as those that build on broad-based university volunteer efforts, are likely only to develop if initially encouraged by government support. High-level discussions, such as those undertaken in the US by the President's Information Technology Advisory Committee (PITAC), may be needed to identify the most important opportunities to explore.

The challenges and benefits outlined in this article demonstrate the advantages that arise from the organization of data within digital libraries, which may be of global import. In the next section, we discuss some of the challenges in more detail. Then we argue for a solution and approach related to key architectural levels. Though our argumentation is based on the US-Korea workshop, the technical issues and corresponding digital library-based solutions may lead us toward significant global benefits.

3. Challenges to Global Digital Library Collaborations

Most of the US-Korea workshop discussions considered the availability of technology to overcome current challenges in digital library applications. The discussions were focused by selection of appropriate themes for organizing the multiple workshop presentations. Applications were used to define new requirements for digital libraries, and scalability issues were selected to choose between software technology implementations. The workshop goal was to develop a global approach for the sharing and distribution of information.

3.1 Global Collections

A dominant requirement for a digital library capable of supporting access by residents of multiple countries is the ability to scale in capacity to manage arbitrarily high access rates. Events that cross cultural boundaries, such as World Cup Soccer, the Olympics, or the landing of the Rover on Mars, can generate web accesses from all countries. The ability to give everyone equal access requires a degree of scalability not previously attempted. Further, equal access implies use of suitable languages within the associated cultural context, and access rates that are not adversely impacted by network topologies (that arise because of geographic and economic concerns) or transcontinental latencies (that arise due to the finite speed of light). The scalability issues relate to many of the technical challenges listed in Table 1.

Digital libraries extend the trend that 'going digital' encourages, with regard to application convergence, including across space and time. Thus, digital libraries must address the needs not only of libraries, but also of archives and museums—and on a global scale. Few users of digital libraries will understand their collections unless context is made clear, which occurs as a matter of course in museum exhibits. Many will refuse to invest in the construction of a digital library if the concept of preservation, strongly associated with archives, is not fully addressed. Long-term preservation is essential if the much more costly processes of scholarship are to bear fruit into the future; alas, this is often ignored by those providing funding. Yet digital libraries that have built-in mechanisms for preservation will assuredly facilitate re-use, as well as support education of the next generation of scholars.

People searching in a digital library focused on a particular genre typically expect that all instances of that genre, from around the world, will be included: Why would a researcher in a scholarly discipline willingly ignore relevant work, from any other country? Further, why would a researcher ignore relevant content, just because it was in the form of a sensor data stream? So, too, what serious scholar would complain if provided with relevant primary data, regardless of its form—whether a satellite image or a photo of ancient art, whether a CAD model or a 3D digitization of sculpture, whether GIS data or a digital representation of an ancient map? Capturing key data has always been fundamental to scholarship; in the context of digital libraries we refer to this as digitization, gathering metadata (including provenance and cataloging information), and recording of relationships. The latter, whether in the form of concept maps, or built into global hyperbases that directly support navigation and linking, are crucial to the development of theories and models capturing the generalities that move us from data to information and knowledge—regarding art, culture, heritage, and history, as well as science and technology.

Modern scientific research, in particular, is dependent on amassing and integrating diverse related sources of data, information, and knowledge. In order to be collected, processed, and analyzed, they must be represented and unified. Geographically connected data must be suitably registered to a consistent frame of reference. Measurements must have units reconciled, and terminology must be interpreted consistently. Only then can data be suitably modeled. Only then can complex dynamic systems (e.g., ecosystems) be described and understood. For example, the Intermedia digital library effort has brought together video data from around the Pacific Rim regarding El Niño, making clear that it had diverse effects in different regions: drought in parts of the US and China, but floods in regions of South America.

Primary data leads to datasets and formation of databases. Through analysis, these lead to publications, at an increasingly rapid pace, that engage collaborating scholars from around the globe. All of the data, information, and knowledge that arise must be made accessible; replication often is involved to make access reliable and efficient.

Access to primary data, as well as the availability of powerful tools for analysis and visualization, and the encouragement of international collaboration are beginning to transform education. Interactive learning is more effective than traditional approaches, and more fun—helping ensure that there will be adequate numbers of R&D workers who will continue to help solve the many crucial problems faced by humanity. The sharing and reuse of primary data is being followed by the sharing of curricula modules and educational resources. While a culture of sharing must be nurtured, so too must there be support for privacy and proprietary processes, as well as management of intellectual property, especially in highly competitive areas (e.g., bioinformatics and chemistry). Modern education is a global system that must be supported suitably by modern (digital) libraries, but they must be significantly extended to ensure scalability and the provision of a broad range of services.

3.2 Technical, Legal, and Political Challenges

When we look at the challenges that arise across each of the application domains, they can be divided into technical, legal, and policy challenges, at the global level.

The technical challenges include quality control over the contents within collections, re-purposing of collections to take advantage of their information content through knowledge management, generation of digital images from paper or art holdings to promote network-based access, cataloging of collections to facilitate discovery, and specification of standard data models to ensure that everyone, worldwide, has an equal ability to display and manipulate the digital entities.

The legal challenges are formidable and are centered upon the specification of intellectual property rights. How can material be shared within the interactive environment of the web, while control over the ownership of the material is properly maintained? A related issue is the management of privacy within web-accessible material. How can sensitive material be anonymized such that statistical analysis and data mining can proceed, while the rights of individuals are respected? The traditional example is the release of census data, which must take place without violating the rights of any individual. When collections become multi-national, the legal challenges become more complex. A joint collection will require joint understanding of what constitutes intellectual property rights and what constitutes private material. Such an understanding, however, may only arise if government support and project deadlines force interested parties to work diligently on hard problems, realizing that opportunities will be lost if challenges are not surmounted.

The policy challenges are related to the processes used to develop joint agreements on legal issues. Examples are the development of policies on the openness of the material, the conditions under which material may be re-used, and the mechanisms that will be used to provide context for a collection. Policy challenges are particularly important when specifying use conditions for restricted material. Equally challenging is the specification of the mechanisms that will be supported for presenting the material. With current technology, it is possible to expose relationships within collections that might be considered proprietary or private. Digital library technology is very powerful and can provide mechanisms for query and discovery across multiple sites allowing relationships to be discovered that otherwise would remain hidden. Balancing these capabilities with privacy policies from around the globe is a great challenge.

3.3 Overarching Challenges

The overarching challenges and cross-cutting issues shown in Table 1 include some of the major research issues related to digital libraries. The need to support multi-lingual interfaces and multi-lingual knowledge bases poses significant computer science research issues. We understand how digital libraries can be used to organize information. What is not yet apparent is how digital libraries can be used to organize knowledge across multiple cultures. We apply semantic tags to data entities to create information, and then look for relationships between the tagged elements to create knowledge. We manage the tagged data elements as attributes in databases. But we have no standard mechanisms for managing the relationships between the tagged elements. The purpose of creating collections of information is to facilitate the generation of knowledge. In cross-cultural settings, the problem becomes even more difficult, as we differentiate between different cultural perspectives. The temporal or procedural relationships that are valid within one culture might be replaced with a different set of intermediate states in a second culture.

The traditional cross-cutting issues are related to economic sustainability of collections, technological sustainability of collections in the sense of preservation, and access sustainability in the sense of scalability across distributed data repositories. One would assume that a distributed digital library would require the use of common mechanisms for each of these sustainability issues. What is not apparent is whether it is possible for the global system to support multiple mechanisms for each of these issues while providing interoperability mechanisms to unify access across the multiple choices. The emergence of grid technologies suggests it may be possible to integrate all independent sustainability models within the global digital library. Data grid technology is an example of the federation of completely independent digital libraries into a common name space with common access mechanisms. This strongly suggests that it will be possible to create global digital libraries, in which differing technical, legal, and policy issues are overcome.

The simplest characterization of a digital library environment differentiates between the management environments required for digital objects (data), attributes about the digital objects (information), and relationships between the attributes (knowledge). Each environment requires explicit infrastructure support. Fortunately, storage of data has been facilitated by the development of data handling systems (e.g., SDSC Storage Resource Broker), and information management has been enabled by the acceptance of a standard information markup language, XML. Management of knowledge has been a major research area for decades, but the recent integration of XML markup languages with Topic Maps for describing relationships promises to provide a suitable infrastructure. Ontologies now can be expressed as XML tagged relationships that associate topics. It is possible to create representations of ontologies that can be implemented on a variety of logic systems. The relevant issues for cross-cultural collaborations include the specification of ontologies for characterizing relationships between semantic terms and the semantic interoperability mechanisms for deciding on common semantic terms.

The challenge in managing knowledge is compounded by multi-lingual access. The concepts used in one language may not map directly into concepts used in another language. This means that even though the same domain knowledge is being described related to the same collection, a multi-lingual digital library will have to maintain multiple ontologies to describe the relationships used to organize the collection. Relevant research questions include whether it is possible to build a joint ontology, whether the mapping between the ontologies can be automated, and whether cross-language information retrieval can facilitate the ontology mapping.

3.4 Transforming Education

Among the areas with greatest need for scalable support in terms of content and collection is education. Learning is a basic human need, so we may understand our world and culture, and contribute to its improvement. Education must be available to everyone, regardless of language, culture, age, and prior experience. Advancing education proceeds only if we nurture communities of learners and teachers, facilitate their engagement in sharing activities, and have suitable educational resources as well as environments that facilitate meaningful learning. Digital libraries thus must provide a broad range of services related to learning. They also must support collecting, organizing, and making available suitable educational resources. Further, because of digital libraries we may improve those resources both in terms of quantity and quality.

Learners may utilize not only secondary materials like textbooks but also primary resources that heretofore were only available to scholars. Thus the entire set of resources that support research and scholarship, in addition to a broad diversity of novel resources including interactive multimedia developed by interested educators (or produced by peers as part of developing their own portfolios of constructive learning materials), can become available through digital libraries. With such a broad range of materials, automatic classification is essential to support discovery and retrieval, as well as routing to instructors eager to find the best materials for a particular situation. We might say that every content item may help in education, presenting a significant challenge, especially when management of intellectual property rights is involved in conjunction with support of authentication and authorization. But this leads to a crucial additional challenge, which is to help learners gain access (directly, during self or distance learning activities, or indirectly, during teacher-directed activities) to the most appropriate items for their personal learning needs and objectives. When diverse languages and cultures are involved, and mappings must be made across those barriers so that resources may be properly understood, we may have what amounts to the grandest challenge for scalable digital libraries! An important instance of that is for learners in the US to better understand Korea, and for learners in Korea to better understand the US—real understanding, not just a section in a course on world history.

Transformation in educational practice is essential if the potential benefits of digital libraries are to achieve fruition. Support for sharing among innovators is essential, including handling of submission, testing, review, editing, annotation, and composition of new materials from a collection of selected resources. When production of the best materials is aided, and when they can be identified easily by busy educators or learners involved in an assignment or engaged in discovery, a radical improvement in meaningful learning may result. Such positive impact, however, requires an architecture supporting powerful, scalable, distributed, interoperable systems with effective operational management, and effective services for users.

4. Architecture and Aspects of a Global Approach

The need for global collections and applications leads to particular challenges for digital library research and development. Distributed collections in today's Internet environment lead to distributed systems, which must interoperate in order to be utilized in an integrated fashion. As can be seen in the next subsections, there is need for: architectures for sustained access in multiple regions, mechanisms to manage distributed repositories, and user-oriented systems that span multiple countries.

4.1 Architectural Levels

The architecture for a global digital library is simplified by considering the following infrastructure levels:

Application/Configuration Layer: This level provides tools to help select digital objects for assembling a new digital library, as well as tools for defining context as concept spaces. Concept spaces support description of users and collections, facilitate matching of levels of expertise and content.

User Services: Multiple services are provided by a global digital library, including presentation, translation, mining, and query support. Presentation services can dynamically define the user context and modify the presentation context to be compatible, controlling user specified presentation versus collection specified presentation (such as choice of character set). Translation services can help develop correspondences between thesaurus-like concept spaces. Mining provides tools to seek out the existence of explicit relationships, thus aiding query support and content based information retrieval.

Domain Knowledge Management: Ontologies organize collection concepts, and require explicit tools to facilitate their maintenance. Tools can integrate existing ontology representations or can extend a specific ontology to include new concepts that describe common sets of information about the collection. Tools also are needed to support the creation of metadata for concepts contained within the ontology, but not expressed within the collection metadata.

Collection Management: The ability to build an infrastructure independent description of a metadata catalog is needed to support migration of collections between different types of database systems. Effectively, this consists of tools to support creation of schema, modification of schema, and publishing of schema for use by other applications. The collection then can be distributed across multiple information repositories, with local control over the digital objects, and access control lists used to protect intellectual property. A distributed collection can be implemented across multiple administrative domains, including internationally distributed sites.

Data Handling System: When digital objects are distributed across multiple storage systems, data handling environments are needed to manage access. This is essential for integrating collections that are distributed across file systems, archives, and databases provided by multiple vendors. Distributed digital object management provides persistent identifiers, replicas of data sets, containers for aggregating digital objects before storage in archives, and archival storage interfaces.

Storage System: The fundamental level of a global digital archive is the set of storage systems used to hold the data objects. The storage systems will be distributed internationally, with each site maintaining local control. A global digital library only becomes possible when the storage systems are integrated.

4.2 Distributed Repositories

Fundamental to modern scholarship is the notion of distributed collections. Each nation, state/province, town, university, department, center/group, or individual may run a server, with collections of data, documents, and other resources. Building upon this base, digital libraries add unique types of support.

First, digital libraries allow sharing not only of various digital objects but also of information describing those objects. Such descriptive information often is called metadata. In many cases metadata is adequate for discovery of useful resources, and generally is more freely shared than are digital objects themselves. Through the Open Archives Initiative (OAI) Protocol for Metadata Harvesting, metadata can be shared, from personal collections (e.g., built using tools that run on laptops, such as Kepler), to group or department or focused collections (e.g., using the Eprints system—see http://www.eprint.org), on up to global repositories (e.g., those registered with OAI, see http://www.openarchives.org). The OAI concept is powerful. Scholars can easily share metadata, without also having to provide services (e.g., web sites with fancy interfaces, searching, browsing). Others who desire can use specialized tools to provide tailored services, after harvesting just the information really needed.

Second, distributed collections allow virtual documents to be constructed. A collage can be assembled from a group of images of art works. A homework assignment can be assembled by an instructor from a small set of problems, each made available as an educational resource. A report can be prepared by a student, drawing upon diverse materials, to argue a particular viewpoint, supported by data, images, and video.

Further, virtual collections can be prepared. A portal might provide access to a virtual collection that is built up in turn from smaller subsidiary collections. Smaller virtual collections can be assembled in a particular specialty area, or larger virtual collections may be built as national resources (e.g., as part of NSDL).

4.3 Integration and Interoperability

One way to assemble a global digital library is to integrate multiple existing libraries. As with PRDLA, this can be achieved through varied mechanisms. It is possible to integrate collections at the following levels:

Data level, through replication of resources. Each site continues to provide their original services, but on a much broader set of primary source materials.

Access level, through implementation of a data grid. As before, each site continues to provide their original services, but they are now able to access material that is managed and archived at another site.

Service level, through implementation of common web services. This makes it possible to apply a service developed at another site to digital holdings on your site.

For these integration mechanisms to work across cultural and language domains, interoperability mechanisms are needed for both semantic and procedural knowledge. One can think of semantic interoperability as providing mechanisms to describe relationships between terms to define equivalent meaning. Procedural knowledge is the expression of the manipulation context that one expects to associate with a collection. This can be as simple as the organization of material for presentation for societies that read right-to-left, or bottom-to-top, or back-to-front. Procedural knowledge defines the manipulation context for the presentation of the material within a digital library.

One of the interesting challenges is that procedural knowledge is typically assumed by the style of display used within a portal for accessing the digital library, to conform to a cultural standard. It is interesting to note that the emerging support for handicapped persons is challenging the ideas of cultural standards for presentation of data and information. A fruitful area of research will be to understand how support for handicapped access addresses the same procedural knowledge issues that are encountered by multi-cultural access.

4.4 User Access

User access to digital libraries brings an additional layer of complexity to the definition of procedural knowledge. The challenge is to preserve not just cultural context, but also the individual user perspective when exploring information. Users can define their preferred access context as a combination of their historical access patterns, their current interests, and their personal level of knowledge. In a multi-cultural setting, each access to a digital library should result in the presentation of a context that meets not only cultural expectations, but also personal preference expectations.

The area of digital library infrastructure where legal, policy, and technical issues most strongly interact is in the management of user access. The specification of what an individual is allowed to access, the rights they have to reuse material, and the rights they have to remain anonymous represent possibly conflicting requirements. The resolution of these conflicts requires policy decisions for use of the digital library infrastructure. For multi-cultural situations this can become a "catch-22" situation. To correctly identify the rights of a person to anonymous access to material, the digital library may first have to identify the culture or organization through which the person is obtaining access. By representing individuals as members of a combination of cultural/organizational groups, it may be possible to decide which policies to invoke without having to explicitly identify an individual. This implies the ability to authenticate membership within an organization or cultural group. This type of identification is being pursued as part of the Shibboleth authentication environment, and is being promoted for use within digital libraries.

4.5 Operational Management

Digital libraries that span cultural systems have substantial operational management concerns. Many of these issues can be cast in terms of policy management decisions, for the criteria under which actions will be taken. Operational management also includes management of user load, by redistribution of accesses to sites where resources are available to meet demand. This in turn implies the replication of data resources between sites to enable access load to be shifted. The mechanisms to manage replicas across sites are provided by data grid technology. Thus operational management for global digital libraries will be linked to the emerging global grid infrastructure.

An international effort has been underway for two years to develop consensus on the services and protocols that will be provided by global grids. The grid community meets at Global Grid Forums that are held three times per year, traditionally in the US or Europe. An effort has been made to plan for grid forum meetings in Asia, to promote a truly global grid environment.

The creation of global digital library infrastructure should carefully track the emerging grid technologies. The typical services include grid authentication mechanisms, support for remote job execution, support for characterizing grid resources, and data grids for building distributed data collections. These services will form a core operational framework for global digital libraries.

The second community that is providing relevant technology includes the standards efforts led by the World Wide Web consortium, ISO, and industrial consortia such as the Object Management Group. These communities are developing technologies to support semantic webs, network protocols, and model driven architectures for infrastructure independent representations of services. The hope is that the technologies developed by the digital library community, the grid community, and the standards groups will be able to interoperate, and create an infrastructure that minimizes the operational management requirements.

5. Conclusions

From the papers, presentations, and working group reports of the US-Korea workshop, we may understand many of the application domains, challenges, and benefits that relate to global digital libraries. Though there are many challenges, especially regarding global collections and user communities, there are also exciting opportunities, such as to transform education. If we organize our efforts around architectural levels, and deal with interoperability and distributed repositories, we may effectively support worldwide user access, and still have manageable and reliable operations.

Bibliography

1. Baru, C., R, Moore, A. Rajasekar, M. Wan, "The SDSC Storage Resource Broker," Proc. CASCON'98 Conference, Nov.30-Dec.3, 1998, Toronto, Canada.

2. Baru, C., V. Chu, A. Gupta, B. Ludascher, R. Marciano, Y. Papakonstantinou, and P. Velikhov: "XML-Based Information Mediation for Digital Libraries." ACM Conf. On Digital Libraries (exhibition program), 1999.

3. Boisvert, R., P. Tang, "The Architecture of Scientific Software," pp. 273- 284, Data Management Systems for Scientific Applications, Kluwer Academic Publishers, 2001.

4. Brunner, R., S. Djorgovski, A. Szalay, "Virtual Observatories of the Future," pp. 257-264, Astronomical Society of the Pacific Conference Series, Vol. 225, June, 2000.

5. C. Chen, "Global Digital Library Development," pp. 197-204, Knowledge-based Data Management for Digital Libraries, Tsinghua University Press, 2001.

6. ClimDB - The Climate Database Project, <http://www.fsl.orst.edu/climdb>.

7. DL - IBM DiscoveryLink, <http://www.ibm.com/solutions/lifesciences/discoverylink.hmtl>.

8. EDG - European Data Grid, <http://eu-datagrid.web.cern.ch/eu-datagrid/>.

9. EML - Ecological Metadata Language, <http://knb.ecoinformatics.org/software/eml/>.

10. Globus - The Globus Toolkit, <http://www.globus.org/toolkit/>.

11. Grid Forum Remote Data Access Working Group, <http://www.sdsc.edu/GridForum/RemoteData>.

12. GriPhyN - Grid Physics Network, <http://www.griphyn.org/index.php>.

13. Gupta, A., Ludäscher, B., Martone, M. E., "Knowledge-Based Integration of Neuroscience Data Sources," 12th Intl. Conference on Scientific and Statistical Database Management (SSDBM) , Berlin, Germany, IEEE Computer Society, July, 2000.

14. HDF - Hierarchical Data Format, NCA (National Center for Supercomputing Applications), <http://hdf.ncsa.uiuc.edu>.

15. I2T - The Information Integration Testbed project, <http://www.sdsc.edu/DAKS/I2T>.

16. ICPSR - Inter-University Consortium for Political and Social Research, <http://www.icspr.umich.edu>.

17. Inmon, Bill, Building the Data Warehouse, 2nd EDITION, John Wiley, 1992.

18. IRIS - Incorporated Research Institutions for Seismology, <http://www.iris.edu>

19. Kifer, M., G. Lausen, and J. Wu. Logical Foundations of Object-Oriented and Frame-Based.

20. KQML - Knowledge Query Manipulation Language, <http://www.cs.umbc.edu/kqml/papers>.

21. Kifer, M., G. Lausen, J. Wu, "Logical foundations of object-oriented and freame-based languages Journal of the ACM, 42(4):741-843, July 1995.

22. Ludäscher, B., A. Gupta, M. E. Martone. "Model-Based Information Integration in a Neuroscience Mediator System ," demonstration track, 26th Intl. Conference on Very Large Databases (VLDB) , Cairo, Egypt, September, 2000.

23. Ludäscher, B., R. Marciano, R. Moore, "Towards Self-Validating Knowledge-Based Archives," pp. 9-16, RIDE-DM 2001.

24. METS - "Metadata Encoding and Transmission Standard", <http://www.loc.gov/standards/mets>.

25. Moore, R., "Knowledge-based Grids," Proceedings of the 18th IEEE Symposium on Mass Storage Systems and Ninth Goddard Conference on Mass Storage Systems and Technologies, San Diego, April 2001.

26. Moore, R., "Preservation of Data, Information, and Knowledge," Proceedings of the World Library Summit, Singapore, April 2002.

27. Moore, R., A. Merzky, "Persistent Archive Basic Components", Persistent Archive Research Group, Global Grid Forum; Jul 27, 2002.

28. Moore, R., C. Baru, A. Rajasekar, B. Ludascher, R. Marciano, M. Wan, W. Schroeder, and A. Gupta, "Collection-Based Persistent Digital Archives - Parts 1& 2", D-Lib Magazine, April/March 2000, <http://www.dlib.orgdlib/march00/moore/03moore-pt1.html> and
<http://www.dlib.org/dlib/april00/moore/04moore-pt2.html>.

29. Moore, R., C. Baru, A. Rajasekar, R. Marciano, M. Wan: "Data Intensive Computing," in The Grid: Blueprint for a New Computing Infrastructure, eds. I. Foster and C. Kesselman. Morgan Kaufmann, San Francisco, 1999.

30. Moore, R., C. Baru, P. Bourne, M. Ellisman, S. Karin, A. Rajasekar, S. Young: "Information Based Computing." Proceedings of the Workshop on Research Directions for the Next Generation Internet, May, 1997.

31. Reference Model for an Open Archival Information System (OAIS). Blue Book. Issue 1. January 2002. Available at <http://www.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf>. This Recommendation has been adopted as ISO 14721:2002.

32. Paton, N.W., Stevens, R., Baker, P.G., Goble, C.A., Bechhofer, S., and Brass, "A. Query Processing in the TAMBIS Bioinformatics Source Integration System", Proc. 11th Int. Conf. on Scientific and Statistical Databases (SSDBM), IEEE Press, 138-147, 1999.

33. PDB - The Protein Data Bank, <http://www.rcsb.org/pdb>.

34. Rajasekar, A., M. Wan, R. Moore, "mySRB and SRB, Components of a Data Grid", 11th High Performance Distributed Computing conference, Edinburgh, Scotland, July 2002.

Please also see the many papers, presentations, and reports of deliberations that appear on the workshop web site at <http://fox.cs.vt.edu/UKJWDL>. This article is based on those and on the subsequent analyses and discussions of the co-authors.

Acknowledgements

This workshop was funded in part by the US National Science Foundation through grant IIS-0090153. Support also was provided by the Ministry of Information and Communication of Korea, the National Computing Agency, and the Institute of Information and Technology Assessment of Korea.

Thanks go to the San Diego Supercomputer Center and its staff for hosting the US-Korea workshop. Professor Soon J. Hyun helped set up the workshop and has continued his work to increase the engagement of Korean government agencies in collaborative efforts. Finally, particular thanks go to all the participants for their involvement in the meeting and in following up on the deliberations.

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/october2002-fox

D-Lib MagazineOctober 2002

Volume 8 Number 10 ISSN 1082-9873