Perspectives on DL'99
Fourth ACM Conference on Digital LibrariesContributed by
Roxanne Missingham
Director, Reader Services
National Library of Australia
Parkes Place, Canberra ACT 2600
rmissingham@nla.gov.auThe Association of Computing Machinery (ACM) is an organisation whose Digital Libraries conferences provide the opportunity for computer scientists, Human Computer Interface researchers and librarians to come together. Over 120 persons attended the conference this year coming from a wide range of countries including Lapland, New Zealand, Italy, the United Kingdom and Germany. I was the only Australian attending. ACM primarily serves computer scientists, with those librarians involved often concentrating more on information retrieval issues. The technical issues of building and retrieving information from digital libraries primarily focused on computer issues, rather than service issues or digital reference. The computing emphasis was evident in the unusual use of language (for example, one speaker talked of searching for items with "topicality relatedness" rather than "on the same subject").
Four strong themes emerged from the presentations and posters:
- Understanding the purpose and delivering digital libraries;
- User needs and evaluation;
- Innovative projects (particularly in music); and
- Sustainable funding models (including the National Science Foundation and DARPA).
In papers presented, the major commercial digital libraries, such as Westlaw, Lexis/Nexis, ISI, Elsevier, High wire, Silverplatter, Dialog, JSTOR, were not represented. This was unfortunate as Westlaw in particular has recently developed some online legal teaching systems that would have been very relevant, and all would have provided case studies of very large digital libraries in production environments.
Purpose of digital libraries and their establishment
David Levy, of Xerox Palo Alto Research Center, gave the opening address to the conference "Digital libraries and the problem of purpose". He reviewed the historical role of libraries over the last century in providing 1) public services (education and social roles for public libraries) and 2) research support to assist in scholarly development and/or communication (academic libraries). He proposed digital libraries created "to put digital materials on line". There was a need to define the communities served by digital libraries and the value of the libraries. Two conflicting purposes were established -- public good and commercial contents where sustainable funding models were essential. Access to the Internet is far from universal, with equity issues relating to race and social class. The community involved in the circulation of information (authors, publishers, sellers, libraries, users and copyright) all face different issues in the networked digital environment. Interestingly, David suggested that the complex issues of quality and veracity (or accuracy) of digital information means that access to a very large quantity of electronic resources does not solve clients' questions well.
Taking a different approach Peter Yialinos proposed a new type of computer memory -- archival intermemory -- to improve Internet performance using metadata of IP addresses of daemons.
In the session on IR/multimedia, papers described:
- a semantic indexing experiment based on MEDLINE to improve information retrieval;
- use of PHAROS architecture to categorise Usenet groups by Library of Congress Classification to enhance automated classification and retrieval;
- use of an oral automated interrogation system (yes, you speak into a system!) to retrieve information from a newspaper archive;
- audio and visual indexing of multimedia material using MDF -- Multimedia Description Format.
User needs/interface issues and evaluation
Catherine Marshall, who will be speaking at VALA in 2000 in Melbourne, described a study of six researchers who have been meeting regularly to discuss research papers. The study compared the use of printed material with the use of an electronic book surrogate, Xlibris. Reading was found to be opportunistic, a physical act, multi-purpose, self-interrupted and not at all straight forward. While the Xlibris interface was promising, it did not deliver much more than using paper delivered. Xlibris does enable electronic annotations, and it would be more successful if it were a mobile device.
Wei Ding described a Baltimore Community Learning Project where visual and text indexes (key images and keywords) of videos were tested separately and combined for effectiveness of retrieval and information transfer. Audio descriptions were not tested. Users found that having both images and words increased the effectiveness of their information recall and retrieval.
Community access to information was studied by Ann Peterson Bishop from CNI/Prairienet. (CNI stands for Community Networking Initiative.) The low income group was found to parallel undergraduates in having an "impoverished" information world with limited information skills and knowledge; facing many insurmountable molehills (such as forgetting the password); problems of convenience, content and comfort; and issues of social support networks and technical literacy.
Creation of personal scientific information tracking systems was explored in a paper presented by Kurt Bollacker. He described Citeseer which tracks new scientific information. There were very interesting parallels to Google and IBM's Clever. The system enables editing of profiles at end user level with the capacity to change terms defined for relatedness, including Boolean matching, text frequency and co-citation measures.
New Zealand's Digital Library formed the test bed for an interesting presentation by Steve Jones, with an experiment at automatic key phrasing (Phrasier) offering access to related documents and Kniles, a WWW-based system, generating key phrases from documents on the fly.
Donald Byrd described his project, which produced a "confetti" scroll bar with coloured dots representing occurrence of search words in documents. Interestingly, his evaluation showed that although users liked the confetti scroll bar, at this stage of the project the evaluation did not show improved effectiveness of retrieval.
Martin Weschler proposed a new element in assessing the effectiveness of information retrieval -- time. He proposed that with the development of the Web, issues of available user time, transmission time and usage time have become more important than traditional measures of relevance.
The topic of a presentation by Leah Larkey was a patent search and classification system for the US Patent and Trademark Office. The project involved a very large digital library of patents (1.5 million patents in 40 collections) with large indexes, a high level of classification and complex query formulation incorporating a phrase dictionary and compound dictionary.
User interfaces were explored in a paper by Soyeon Park. Retrieval used HERA and HERMES systems to locate information from the Congressional Record, Federal Register, Financial Times and Wall Street Journal and results were then compared. Graduate students in Library and Information Sciences formed the test group, with the results indicating a preference for common interfaces (HERMES) to integrated interaction (HERA).
Innovative Projects
Music and image libraries formed a core of innovative projects covered by papers at the conference. Jon Dunn presented a paper describing the electronic reserve of music materials (4% of the audio collection) at Indiana University, Bloomington. Audio musical recordings are converted to WAV files, compressed to PEG level 1, and made available through computers within the library. Copyright permissions have yet to be sought.
Music was also the topic of Massimo Melucci's presentation describing a project at Padua University that involved parsing and searching against a database of musical documents (MIDI and a web based system).
David Bainbridge and his co-authors won the prize for best paper at the conference. The paper described the New Zealand Digital Library of Popular Music. With a collection of over 100,000 MIDI files located in web searches, they have built a system incorporating Optical musical recognition for scores and melody-based querying, as well as text searches. In designing a system (web based) for popular music, the project incorporates interfaces and content likely to appeal to a broad client group.
The Networked Digital Library of Theses and Dissertations was presented by Neill Kipp as an important emerging area of digital publication, an international project as yet primarily US-based. (The Australian Digital Theses Project was on display at Online and Ondisk 1999 and shows great promise.) Since January 1997, Virginia Tech has required electronic publication and submission of all theses and dissertations. Currently theses are categorised as worldwide access (47%), Virginia Tech access only (31%), Patent restricted access (20%) and mixed (some parts suppressed from wide access 2%).
A paper presented by Tammara Combs focussed on her study of image storage and retrieval. Image browsers and retrieval systems compared included: ThumbsPlus, Zoomable Image Browser, Simple LandScape and PhotoGoRound. Comparisons were made of user satisfaction, recall and incorrect selections.
Automated image classification (which, as yet, has a limited test bed) was explored in a paper by Lim Joo-How. Joo-How was positive about the future for automatic keyword indexing of images.
The Astronomy Digital Image Library was described by Robert McGrath. In extending the WWW model, he described the need for a search model that he called a "conversation with the data". The model is needed because the complexity of information stored as datasets, rather than as documents made up of text and images, made the "search-and-download" an impractical model for scientific data. Metadata was built using Z39.50 with a flexible searching gateway, XML, Astronomy Mark-Up Language and a Java browser. Z39.50 enables discovery of text and non-text resources, with issues still to be resolved in the definition of astronomy metadata. The issue of specific subject as opposed to a single metadata standard emerged as a significant issue in discussion.
Sustainable funding
The Computing Research Depository is an archive of professional papers initiated by the ACM. Joseph Halpern described its development and implementation as a centralised database, stored on the Los Alamos National Library system, with access also available through the Networked Computer Science Technical Reference Library (NCSTRL). Material is divided into 33 subject areas, each with a moderator. Material is added without copyright clearance or peer review. All versions of papers are stored, with a variety of formats available. In the long term, they aim to have as many papers as possible in TeX format. At present, the Depository is small, with limitations in terms of user interface for depositing material and a lack of publicity limiting use.
NCSTRL was also explored in a paper presented by Naomi Dushay. Discussed were the query mediator functions and performance of indexes while retrieving material from the NCSTRL repository. Many discrepancies were shown, unexpectedly causing delays in retrieval.
A panel session on the National Science Foundations proposal for a Digital Library in Science, Mathematics, Engineering and Technology Education (SMETE) provided discussion on a major funding opportunity to build a resource to be used for all undergraduates in these subject fields for universities in the USA. The Information Portal will provide material of national importance -- see http://www.smete.org for objectives and program information. (Note: the word "portal" has replaced "subject gateway" in most papers).
Concluding panel
The Conference ended with a large panel addressing a range of issues -- from the need for a social purpose for digital libraries, to issues questioning how digital libraries are evaluated -- and with a call for more commercial applications, new social informatics and a redefinition of the nature of digital libraries. Acknowledging the limitations of some of the conference papers, a call was made for a single digital library conference in the USA each year rather than the two competing ACM and IEEE conferences.
In summary, I suggest watching the ACM Digital Libraries conferences for information about emerging technologies and innovative applications, but attending conferences such as Online and Ondisk for good trade exhibitions and demonstrations as well as discussion of large stable digital libraries.
Click here to return to the D-Lib Magazine "In Brief" column.