The Internet and World Wide Web (Web) afford libraries new possibilities to disseminate information. For instance, many libraries are already offering on-line public access catalogs, public ftp sites, or repositories of Internet resources. In the near future, digital documents will belong to the document collection of a library as well. Since these trends are changing the definition of what a conventional library is, the terms "virtual library" and "digital library" have come into use.
In this paper, we only use the term "digital library". In accordance with Gladney, Fox et al. 1994, we view a digital library as extending the holdings of a conventional library into digital documents and Internet resources. An Internet resource is a link to other digital documents which are stored elsewhere on the Internet. Thus, only the link is under control of the library, not the document to which the link points. Additionally, a digital library provides digital catalogs, containing meta-data about the holdings (i.e., digital documents, Internet resources, and physical documents like books, journals, etc.). Finally, a digital library must accomplish, as far as possible, all necessary services of conventional libraries and must also exploit the advantages of the technology used. According to Nuernberg, Furuta et al. 1995, a "digital library system" consists of several components. To set up and to use a digital library requires a client and server computing system and tools supporting interaction among people and between people and client or server software.
Normally, digital libraries are based on the Web server technology. This enables patrons to access the library using a Web client like Netscape Navigator or Microsoft's Internet Explorer. These days, a lot of different Web servers are available (e.g., httpd from NCSA, or the different servers provided by Netscape). In several projects these servers have proven useful for building digital libraries. Beyond this, a second generation Internet information system exists, namely Hyper-G or HyperWave (we will be using the term "Hyper-G" throughout this paper). Since Hyper-G has already become widely adopted, our research group at Dortmund University decided to use Hyper-G to set up a digital library system called DogitaLS1. DogitaLS1 is an acronym for "The Dortmund Digital Library System of LS1" (LS1 is the name of our lab at the Computer Science Department of the Dortmund University). LIBERATION is another European project in which Hyper-G is being used to set up a digital library. The main idea of this project is to distribute already existing electronic information to libraries (e.g., via CD, local or wide area networks). Since this project has just begun, first results are not yet available.
In this paper, we report about our experiences with using Hyper-G as an underlying server technology for our digital library system. To do this, the remainder of the paper is structured as follows:
Section 2 gives a brief overview of Hyper-G. Then, section 3 describes
the structure of DogitaLS1. It also shows how we exploit the concepts
of Hyper-G for organizing the holding of DogitaLS1. In section
4, we report about our experiences with using Hyper-G. This includes
a list of requirements for future Internet information systems
that aim at being used for digital library systems. Finally, we
give a brief conclusion in section 5.
Remark to the references used:
References to documents available on the
Internet are made explicit by links pointing to them. All links
have been proven to be defined in September 1996. Once the paper
is published, any changes of URLs used in this paper will not
further be updated by the authors. Thus, undefined links might
be possible at some time in the future.
Hyper-G is a second generation Internet information system that is being developed under the leadership of Herrmann Maurer and Frank Kappe at the Institute for Information Processing and Computer Supported New Media (IICM) at Graz University (Austria). Hyper-G comprises server and client software. The server software is available for the several different operating systems including SUN Sparc and IBM AIX. The client software runs under Microsoft Windows (Amadeus) and under Unix (Harmony). Hyper-G complies with the WWW; widely accepted Web clients, like Netscape Navigator, can be used to browse a Hyper-G server. Hyper-G clients can be used to browse a WWW server like NSCA's httpd.
The key features of Hyper-G include
More information about Hyper-G can be found in the book "HyperWave - The Next Generation Web Solution", recently launched by the development team of Hyper-G/HyperWave.
The aim of our research is to examine Hyper-G for defining organizational structures for digital libraries. To do this, we set up a digital research library for our lab providing a heterogeneous document collection, i.e., a collection containing:
In addition, we place a special emphasis on services for different type of users (e.g., librarians and patrons) as well as communication services. A more detailed description of the communication services is omitted here. We recommend our technical report "A first Step Toward Communication in Virtual Libraries" for those who are interested in this area.
We installed six collections at the top level of DogitaLS1. The collections "Catalogs", "Digital Documents", and "Internet Resources" serve for storing meta-data about physical documents, digital documents including their meta-data, and Internet resources, respectively. The collections "Services" and "Workspaces" are needed for general services and communication services. Finally, an on-line help is provided in the collection "On-line Help" (c.f. figure 3.1). In the next sections, we describe how we took advantage of Hyper-G concepts for the integration of documents belonging to the different collections.
Remark: All screen shots are taken from Harmony.
The collection "Catalogs"
The collection "Catalogs" includes an alphabetic catalog in which meta-data about all books, journals, etc. of our research library are stored. A question was how to model the corresponding Hyper-G documents so that we can take advantage of the integrated search engine of Hyper-G. One possible solution could have been to use attributes that can contain meta-information about a Hyper-G document. However, except for the keyword attribute, all the pre-defined attributes relate to a Hyper-G document and not to books or journals about which a Hyper-G document contains information. For example, assume the user tochterm creates a Hyper-G document providing meta-information about a book written by Aho and Ullman. Then, the attribute Author of the Hyper-G document contains the name tochterm and, thus, it cannot be used to store the names of the two authors Aho and Ullman. We therefore decided to put important information (authors, title and publication year) about books, journals, etc. into the title of the corresponding Hyper-G document (c.f. figure 3.2). Some other useful information, like umbrella words, is stored in the keyword attribute. (Note, the Hyper-G document itself contains much more meta-information about the cataloged book or journal.) As a result, a title search can be used to search for books by author's name, title or publication year. In addition, a keyword search can be used for searching by means of umbrella words.
The collection "Digital Documents"
Digital documents (e.g., research reports and other publications written by members of our lab) are stored in this collection. To ease browsing through the document collection, we set up a small systematic catalog covering the different areas in which members of our lab do research. We had three requirements to meet when we started thinking about modeling digital documents in Hyper-G:
To capture this intention, the use of digital objects as introduced by Kahn and Wilensky seemed appropriate to us. In their notion, a digital object can be regarded as a content-independent package. The principal components are a unique identifier for the digital object (its handle), and data. The data in the digital object package is, itself, a container for streams of bits that may take multiple forms (e.g., Postscript or HTML).
In DogitaLS1, we modeled digital objects by means of collections. Note, that we have now two different notions of collections: 1. we use them to define the overall structure of DogitaLS1, that is as containers for documents; 2. we use them to model digital objects. This might be irritating from a conceptual perspective. On the other hand, there was no other way in Hyper-G to represent digital objects. Figure 3.3 sketches three digital objects belonging to "Information Systems" in the classification scheme for Digital Documents.
When the collection is being created, Hyper-G assigns a server-wide unique handle to it. Each digital document is embedded in a collection that also contains meta-data about the document and different formats for the document (normally HTML and Postscript). The meta-data is stored as a collection head. In Hyper-G, a collection head is a special document which is always displayed, when a user (e.g., librarian or patron) accesses a collection. In this way, whenever a digital object is accessed, the user gets information about the data stored in this digital object. The following figure shows a digital object represented as collection in Hyper-G. The document "Abstract: Kommunikation in virtuellen Bibliotheken" contains an abstract and meta-data about the document. The actual document is available in two different formats, HTML and Postscript.
The collection "Internet Resources"
In this collection, links to interesting resources available on the Internet are collected. Each link is represented as a Hyper-G document. As a consequence, meta-information about each link can be stored in the keyword attribute of Hyper-G documents. Due to this, a keyword search for links is possible in DogitaLS1.
Links to resources in other Hyper-G servers are displayed in the same manner as if the resource were stored in DogitaLS1. This is a big advantage for setting up distributed digital libraries based on Hyper-G. The reason is that in distributed digital libraries, users need not know where documents are stored - whether locally or on remote servers. Among others, the following figure displays a link to the "Journal of Universal Computer Science" which is stored on a server in Graz (Austria). However, the user interface displays this link in the same way as a collection stored on the server of DogitaLS1.
The collections "Services" and "Workspaces"
The current version of DogitaLS1 provides several services (written in PERL) adapted to the needs of librarians and patrons. All these services are stored in the collection "Services". The Common Gateway Interface mechanism is used to provide interaction among users and between users and DogitaLS1.
In the collection "Workspaces", librarians and patrons have their own workspace where they can store private documents, private links, etc. Besides private documents, logical copies of all the services needed by a given librarian or patron are stored at this place. To do this, we exploit the copy document function of Hyper-G. The advantage of this approach is that all the services are available at a central place in DogitaLS1. This is a big relief when new librarians or patrons are to be added to or to be removed from the system. For adding a new user, we only have to create a new workspace and add copies of the services needed by the new user. For removing a user, we can just delete his collection without worrying about the documents and services stored therein.
First, this section describes our experiences with Hyper-G. We
also give a list of items that are not covered by Hyper-G but
would have been useful for our digital library system.
4.1 Valuation of concepts currently available in Hyper-G
In general, both the server and client software run without severe problems. We are also grateful to the development team in Graz which normally helped us immediately when we had technical problems. Finally, a newsgroup (comp.infosystems.hyperg) may be used for asking other Hyper-G users questions. Thus, a lot of technical support exists for Hyper-G. Besides these more general comments, we evaluate concepts provided by Hyper-G with a special respect to digital libraries.
Collections
Collections in Hyper-G are very useful to define the structure for the holding of a digital library. Since collections in collections are allowed, it is also possible to set up more complex structures for organizing the document collection.
Link consistency
Normally, more than one attempt is required for defining the organizational structure of a digital library. Unfortunately, new requirements arise usually after documents have already been stored in the library. In addition, at that time, links between documents have already been established. Since re-organizing the document collection in a digital library means moving documents from one collection to another, links are most probably affected. By contrast to "traditional" servers where a link denotes a path to a file on the server machine, in Hyper-G, links are objects separated from documents. The advantage of this is that in Hyper-G documents can be changed between collections without worrying about dangling links. This was a useful feature for prototyping several different collection hierarchies in DogitaLS1.
Access rights
Different parts of a collection can be made accessible to different types of users using access rights. An advantage of the concept of user accounts in Hyper-G is that they do not depend on user accounts on the server or on any other machine. Therefore, it is not necessary that patrons must have an account on another machine. A further advantage is that even links can be protected by access rights. This means that link filtering is possible.
External tools
Hyper-G provides a lot of external tools (e.g., hginstext, hginscoll, and hggetdata for inserting text documents in the server, for inserting collections in the server and for retrieving documents from the server, respectively). We could exploit these tools in different CGI-programs (e.g., for inserting already existing on-line-data of our research library in DogitaLS1).
Aspects of distributed digital libraries
In the near future, the holdings of digital libraries will not only be based on one server but will be distributed among several servers. In three ways, Hyper-G already covers aspects for distributed digital libraries:
4.2 Requirements for future systems
This section lists items that we are missing in the current version of Hyper-G. To us, these items seem to be helpful for future Internet information systems that intend to be used for digital library systems.
On the basis of Hyper-G, we set up a digital library system. This system consists of the server and client software as well as different tools that we developed (however, we omitted the description of tools in this paper). The document collection of DogitaLS1 mainly consists of documents that are needed in the day-to-day work of members of our lab. Even though this document collection is rather small, Hyper-G has proven stable for huge amounts of documents as well. For example, the Graz University runs a Hyper-G server to store meta-data about its documents. In October 1995, this server contained more than 300,000 documents.
As we see it, Hyper-G offers several features that are not presently available in other Internet information systems. In this paper, we showed how most of these features can be exploited to serve needs of digital libraries. Since we are convinced of the capabilities of Hyper-G, we will be using it in a joint project with the library of the Dortmund University. The aim of this project is to build an electronic archive for master-level and doctoral theses.
The first author is grateful to the Max Kade Foundation, New York USA, for funding his post-doc at the Center for the Study of Digital Libraries of the Texas A&M University.
hdl:cnri.dlib/october96-tochtermann