D-Lib Magazine
September 2003
Volume 9 Number 9
ISSN 1082-9873
In Brief
ECDL 2003 Workshop Report: Digital Library Evaluation - Metrics, Testbeds and Processes
Contributed by:
Christine L. Borgman
University of California, Los Angeles
<borgman@gseis.ucla.edu>
Ronald Larsen
University of Pittsburgh
<rlarsen@mail.sis.pitt.edu>
(This workshop was held in conjunction with the 7th European Conference on Digital Libraries (ECDL 2003) held in Trondheim, Norway on 17 - 22 August. The conference web site is at <http://www.ecdl2003.org/>.)
A major challenge for digital library research and practice is to find relatively non-intrusive, low cost means of capturing appropriate data to assess the use, adoption, implementation, economics, and success of digital libraries. The National Science Foundation (NSF) and DELOS explored these and related research questions in the 2002 Budapest workshop, Evaluation of Digital Libraries: Testbeds, Measurements, and Metrics. The purpose of the 2003 ECDL workshop was to present the findings of the Budapest meeting to a wider audience, to update participants on progress since the Budapest meeting, to extend current work in evaluation, and to identify new challenges and opportunities. The workshop continued an international dialogue on metrics, testbeds, and processes for digital library evaluation coordinated by the NSF in the US, and the DELOS Network of Excellence for Digital Libraries in the EU, and the Joint Information Systems Committee (JISC) in the UK.
The workshop was organized by Ronald Larsen, School of Information Sciences, University of Pittsburgh. Christine Borgman of UCLA, and Philip Pothen of JISC also made presentations. Thirteen people participated in the workshop, including the speakers. Participants came from six countries in Europe, the Middle East, and the U.S. They were a mix of university faculty, doctoral students, and professional librarians from universities, national libraries, and foundations.
Ronald Larsen outlined the workshop goals and described the international efforts of the Dlib Metrics Working Group that led to the EU DELOS-NSF workshop on Evaluation of Digital Libraries held in Budapest in June, 2002 <http://www.sztaki.hu/conferences/deval/index.html>. Christine Borgman, co-chair of the Budapest workshop (with Ingeborg Solvberg of Norway), reported on that meeting, which was organized around two themes: metrics and testbeds for evaluation, and context-sensitive evaluation. Results of the workshop, as reported to NSF <http://www.dli2.nsf.gov/internationalprojects/working_group_reports/evaluation.html> included proposals for the development of testbeds for comparative evaluation and for research on context-sensitive evaluation methods and measures. Philip Pothen reported on JISC progress in the UK on the iterative evaluation of the Distributed Network of Electronic Resources (DNER), which is now known as the Information Environment (IE). The DNER/IE evaluation is focusing on issues such as the use of digital libraries for teaching and learning versus use for research and scholarship. Their measures include pedagogical assessment, institutional impact, adoption, information needs, and uses of individual resources (e.g., browsing, assessment of quality, cross-searching between DLs). Presentations and notes from the workshop are posted on the website at <http://www.sis.pitt.edu/~ecdl2003>.
Discussion among the participants was wide ranging. While the Budapest workshop participants were largely current or prospective grantees of the sponsoring agencies, this group was dominated by practitioners with real on-the-ground DL evaluation issues to address. We drew upon their experiences to identify (1) processes and methods, and (2) metrics and measures for DL evaluation <http://www.sis.pitt.edu/~ecdl2003/results.htm>, <http://www.sis.pitt.edu/~ecdl2003/papers.htm>. Discussion began at the micro level (individual users) and proceeded toward the macro level (institutional concerns). In the second part of the discussion, we took a "round robin" approach, eliciting one key funding priority from each of the participants. Suggested priorities included canonical studies to establish baselines for performance among functionally similar DLs; measurement strategies for the transition from physical to digital library services and infrastructure; measurement of the economic impact of digital libraries on teaching, learning, research, administrative operations of libraries, and on records of universities; business models, funding schemes, sustainability, and scalability strategies for DLs; and the use of the NSF-funded National Science Digital Library (NSDL) as a testbed for DL evaluation research.
Next steps from the workshop are to circulate a draft report for comment to the participants, submit a subsequent edited report to appropriate funding agencies for their consideration, and circulate the results of this and prior workshops on evaluation of digital libraries to the larger digital library community of research and practice.
References:
ECDL evaluation workshop website: <http://www.sis.pitt.edu/~ecdl2003>.
EU DELOS-NSF Evaluation of Digital Libraries Workshop (2002)website: <http://www.sztaki.hu/conferences/deval/index.html>.
Summary report to NSF from the EU DELOS-NSF Evaluation of Digital Libraries Workshop (2002): <http://www.dli2.nsf.gov/internationalprojects/working_group_reports/evaluation.html>.
Contributed by:
Douglas Tudhope
School of Computing
University of Glamorgan
Pontypridd, Wales, United Kingdom
<dstudhope@glam.ac.uk>
The 2nd European Networked Knowledge Organization Systems/Sources (NKOS) Workshop, organized by Doug Tudhope with assistance from Denise Bedford, Martin Doerr and Gregory Shreve, took place on August 21, 2003 in Trondheim, Norway as part of ECDL 2003. The full-day workshop was attended by 35 people.
Knowledge Organization Systems/Services (KOS), such as classifications, gazetteers, lexical databases, ontologies, taxonomies and thesauri, model the underlying semantic structure of a domain. They act as semantic road maps and make possible a common orientation by indexers and future users (whether human or machine). A vast legacy of intellectual knowledge structures and indexed collections is available. The workshop addressed some of the challenges involved in leveraging the full potential of KOS for advanced DL applications. We have the opportunity to formalise and enrich existing representations for automated KOS applications, exploiting the infrastructure of Semantic Web languages and technologies. Networked KOS services and applications are reaching the stage where they might exploit common representations and protocols. Empirical studies of use are needed for effective design of KOS and KOS-based services.
The morning sessions provided an overview of projects related to the usage of KOS in Internet-based services and DL. Jens Vindvad proposed a model to address validation when exchanging information between heterogeneous systems drawing on 'MicroSchemas' in a namespace environment. Stephan Koernig discussed a generic Web Services framework for KOS-supported scientific net(work)s, addressing standardisation at different levels of DL services. Robert Tansley presented SIMILE, a successor project to DSpace, seeking to enhance inter-operability among digital assets, schemas, metadata, and services. Alistair Miles proposed a candidate Thesaurus Interchange Format, in the form of an RDF Schema. He described a thesaurus management API built on the Jena RDF API, and some prototype thesaurus web services. Diane Vizine-Goetz gave an overview of the OCLC Terminology Services project, which aims to make a set of vocabularies available using open standards and various web protocols and technologies. Marianne Lykke Nielsen reported findings from two user studies with a corporate, digital thesaurus. These formed part of a user-oriented design/development of the thesaurus and subsequent evaluation of its use.
The afternoon presentations and discussions concerned the evolving relationship between research and development projects and (emerging) KOS-related standards, both formal and informal. This provided an opportunity for communication between European and US standards initiatives and projects. Informal demonstrations were also given of prototype KOS web applications and services. Dagobert Soergel, who had reviewed the earlier JCDL NKOS workshop in the morning, provided a critique of the Z39.19 thesaurus standard, as rooted in the print world and taking a limited view of thesauri. This was followed by reports from Diane Vizine-Goetz and Stella Dextre Clarke on the ongoing work of the US NISO and UK BSI Thesaurus Standards Groups, currently considering revisions to the existing standards. In the first presentation by a member of the linguistic community at an NKOS workshop, Greg Shreve reported on various Terminology Standards and the work of the ISO TC 37 linguistics standards group. It was clear that there was potential synergy between linguistic and KOS communities. Martin Doerr presented the CIDOC Conceptual Reference Model, currently at ISO Committee Draft stage. This standard can provide a bridge between domain specific KOS and higher level ontologies, which provide more formal specification of the roles played by KOS structures, and between the KOS and knowledge representation communities. Drawing on her ASIST SIG/CR 2002 Workshop position paper, Linda Hill discussed issues and research implications in the greater integration of KOS systems into DL architectures. As illustration, she discussed ADL work on a Thesaurus Protocol for programmatic access to distributed thesauri.
Participants identified various open issues and themes for continuing work/collaboration:
- the extent to which any revised thesaurus standards should extend current standard relationships by hierarchical specialisation (maintaining backwards compatibility)
- the practical possibility of maintaining an inventory ('namespace' ) with standard definitions of core KOS relationships and which body could maintain such a registry
- standard representations of KOS, including representations of facet structure
- a basic KOS service protocol and the need for a model of KOS services as part of an overarching DL family of entities and services
- the need for standards on distributed KOS protocols and data exchange
- a taxonomy of KOS and a way of describing different intended purposes and use contexts of KOS
- empirical studies of KOS in use, search styles and search guidelines for KOS-based applications, user-centred visualisations and access points
- relationships and frameworks for mapping between KOS
Further information on the workshop and links to the presentations can be found on the Workshop website at <http://www.glam.ac.uk/soc/research/hypermedia/NKOS-Workshop.php>.
NKOS is a community of researchers, developers and practitioners seeking to enable KOS, as networked interactive information services via the Internet. This was the second European NKOS Workshopfor details of the previous workshop and the six US NKOS Workshops to date, see the NKOS website at <http://nkos.slis.kent.edu/>.
Submissions are currently sought for a special edition of the Journal of Digital Information on new applications and contexts for KOS. The submission deadline is 7 October 2003 (see http://jodi.ecs.soton.ac.uk/calls/newnkos.html).
Contributed by:
Bob Fields
Interaction Design Centre, Middlesex University
<b.fields@mdx.ac.uk>
Anne Adams
UCL Interaction Centre, University College London
<a.adams@cs.ucl.ac.uk>
Patty Kostkova
City ehealth Research Centre, Institute of Health Sciences
City University, London
<patty@soi.city.ac.uk>
Organised by Anne Adams of UCLIC and Patty Kostkova of City University, this workshopheld in conjunction with the 7th European Conference on Digital Libraries (ECDL 2003) held in Trondheim, Norway on 17 - 22 August brought together approximately a dozen practitioners and researchers, each of whom had an interestin one way or anotherin the design, implementation, deployment or use of digital libraries (DLs) and similar technologies in the domain of Healthcare. The presenters at this workshop reflected a number of perspectives on different information technology (IT) initiatives, some proposing new technical solutions to problems in the healthcare domain, some offering evaluations and explications of the impact of current programmes, and several urging a critical examination of the social and organisational implications of healthcare DLs.
The workshop opened with an invited talk by Rob Procter, Edinburgh University, who outlined Edinburgh University's Social Informatics approach, which involves studying and supporting design through the use of technology in context. Rob's critique of digital libraries from an ethnographic angle, highlighted the fact that searching and information finding are often stereotyped as individual tasks, divorced from any social, organisational or material context. The Social Informatics corrective is to use ethnographic field research to understand the circumstances in which activities occur, as well as their relationship to their context. A case study of this mode of research focused on the Scottish Poisons Information Service and the use within that service of TOXBASE, a searchable online DL containing information on toxins and treatments for poisoning. The service is used directly by medical professionals in support of diagnosis and patient treatment, and it also supports operators of a telephone helpline service. The talk emphasised the importance of the service's telephone operators as intermediaries, mediating between the information needs of medical practitioners and the digital resources. In addition to operators being familiar with TOXBASE and possessing the specialised skills necessary to use it effectively, in their role as intermediaries the operators bring other benefits to the system. For instance, doctors appear to have greater confidence in information gained over the helpline, even though they could have accessed the same TOXBASE resource directly themselves.
Anne Adams (University College, London, UK) presented a fascinating study of "outreach librarians" supporting the deployment of digital library systems to address the needs of teams of psychiatric practitioners. The librarian's role was to support practitioners' information seeking activities and to contribute more generally to the implementation of Evidence Based Medicine. It was observed that the librarian's role grew to encompass a wide range of information-related concerns, and the librarian's presence seemed to raise the profile and importance of information in clinical discussion and practice. Furthermore, many unexpected benefits, such as improved cross-team communication and increased confidence of individuals in making use of information resources, were apparent. The study makes a strong case for digital libraries as more than just technology, highlighting the importance of librarians as intermediaries who are able to cross the boundaries between clinical practice and digital resources.
Gemma Madle (City University, UK) spoke about a project to develop an online community of practice around the National Electronic Library for Communicable Diseases. This work was inspired by the realisation of a number of important issues in the development of such a resource relating to quality control, the inclusion of controversial material and the timeliness of research results. Issues such as these may be best addressed by a community of practitioners concerned with content provision and quality review, and consequently, a collection of online resources to allow members of the community to interact has been implemented. Madle's talk provoked a great deal of interesting discussion about building and sustaining communities around digital libraries and review processes, and about ways of achieving participation in the review community.
Bryan Manning (European Federation for Medical Informatics) raised a range of issues connected with technologies supporting coordination of work across people and professions in complex clinical processes. The approach he outlined had to do with modelling workflows involved in clinical processes, and the identification of major related activities. One use of such modelling was to designing electronic "crib sheets" to aid medical practitioners in making decisions. Repositories of such "crib sheet" information could be delivered to practitioners through handheld user interfaces, and notes made by the practitioner as tasks are carried out could be used to automatically generate parts of the patient record.
Bob Fields (Middlesex University, UK) talked about further issues in the development of digital information resources, in particular Electronic Patient Records (EPR) and Electronic Healthcare Records (EHR). In his talk, he proposed a set of concepts that will provide the analytic cornerstone of a forthcoming study of the implementation of EPRs and EHRs. The conceptual framework analyses how individuals work together, share information and hold meanings in "common information spaces", and between different communities through the creation of "boundary objects". EHRs can be viewed as precisely this sort of boundary-crossing object, and their design can be informed by studies of the kinds of coordination EHRs are intended to support.
Patty Kostkova (City University, UK) reviewed the National electronic Library for Communicable Diseases (NeLCD), which is being developed at City University. This library provides a portal to evidence-based information on treatment, investigation and prevention of communicable disease. Patty discussed the use of autonomous intelligent agents within the library to support search, information publishing, peer review and data exchange. In particular, Intelligent Search Agents (ISA) and Pro-active Alert Agents (PAA) were used to support user profiling and customisation of the library to suit personal preferences, backgrounds and medical specialties.
Thorsten Kurz (University of Neuchatel, Switzerland) discussed methods and tools to improve and personalise access to concepts within healthcare ontologies. In particular these methods and tools were used for two ontologies regarded as vital within the Swiss healthcare system: the ICD 10 and the TARMED ontology. Development and testing of the tools' basic functionality is currently underway along with evaluation of the user interface design.
Pedro Sousa (Universidade de Nova de Lisboa, Portugal) presented an agent-based web content categorization system supported by a framework for Internet data collection. The importance of the system learning process was emphasised, ranging from the sample's database creation, automatic feature selection, classifiers' induction and decision support system definition. The presenter highlighted the applicability of this system within the health domain to reduce information overload for busy clinicians.
At the end of the day, an extended discussion highlighted that digital libraries could have a powerful positive impact upon the health domain. However, across Europe, the health domains' rigid hierarchies, as well as clinicians' negative perceptions of technology, have hindered digital library uptake. Technological change, it was noted, can support or work counter to user needs. The question raised in discussion was how to increase successful technological uptake. Some workshop participants proposed systems designed according to user needs that would be implemented through communities who thus felt ownership in the technology. Still other participants proposed legislative enforcement of change. Additional workshop discussions reviewed digital library issues of provenance, privacy, data protection and security.
More information about the ECDL 2003 workshop on Healthcare Digital Libraries (HDL 2003) and the work presented there is available from
<http://www.soi.city.ac.uk/~patty/HDL2003/HDL%202003%20Workshop%20Top.html>.
Contributed by:
Carol van Nuys
Project Manager of the Paradigma Project
The National Library of Norway
<carol.vannuys@nb.no >
On 21 August 2003, approximately 50 people from 19 countries gathered to hear about web archiving research and practice, as the 3rd Workshop on Web Archives convened at the Britannia Hotel in Trondheim, Norway. The workshop was held in conjunction with the 7th European Conference on Research and Advanced Technologies for Digital Libraries (ECDL2003) and was organized by the French National Library, the National Institute for Research in Computer Science and Automation in France, and the Vienna University of Technology in Austria. Julien Masanès, Bibliothèque Nationale de France, chaired the workshop in Catherine Lupovici's absence.
A call for papers resulted in the submission of sixteen papers on several relevant topics, but because the workshop was scheduled for just a single day, the program committee choose only ten papers for presentation and inclusion in the workshop proceedings. These ten papers addressed three major topics: Identification, Tools, and Case Studies.
The first two papers early in the session focused on Identification. John A. Kunze talked about the importance of persistent identifiers for identifying electronic resources, and the California Digital Library's use of the ARK identifier. Ketil Albertsen and Carol van Nuys presented the National Library of Norway's Paradigma Project and suggested using the IFLA FRBR model for defining the identifier semantics needed for the identification of network accessible documents.
The next three presentations focused on Tools. Eva Müller described the DiVA Project team's experiences at the Uppsala University where they now have an existing workflow between their local repository and the Swedish National Library Archive. Thorsteinn Hallgrimsson's presentation followed, about the Nordic Web Archive's (NWA) work to develop a general tool for accessing all types of web archives. This project is in its second phase and is a collaborative project among the five Nordic national libraries. In the final presentation of the morning session, Ketil Albertsen, the National Library of Norway, reported on the Paradigma Project's Web Harvesting Environmenta complete system for handling the legal deposit of digital documents from the time of harvesting to access by the end user.
All workshop participants were given the opportunity to say a few words about their countries' current web archiving activities before the workshop broke for lunch.
The last session of the day was dedicated to hands-on experiences and case studies. Daniel Gomes, University of Lisbon, described the characteristics of the Portuguese Web, i.e., its size, file types and composition. Gina Jones, U.S. Library of Congress, followed with a presentation on building thematic web collections, exemplified by the September 11 Web Archive and the Election 2002 Web Archive. The many challenges connected with the collection and long-term preservation of electronic information resources on the Czech web were presented by Petr Zabicka from the Moravian Library in Brno. Unfortunately, due to illness, Kent Norsworthy could not present his paper, but all participants were urged to read about The Center for Research Library's research on political communications web archiving. Jennifer Gross, the workshop's last speaker, described experiences at the University of Heidelberg as they work to build a digital archive for Chinese studies (DACHS). A lively discussion followed the presentations, and participants were encouraged to use the workshop list for this purpose: <http://listes.cru.fr/wws/info/web-archive>.
Further information and links to the presentations are available. See the Web Archive Workshop home page at <http://bibnum.bnf.fr/ecdl/2003/index.html>.
Contributed by:
Carol Peters
Italian National Research Council
Pisa, Italy
<carol.peters@isti.cnr.it>
The main objectives of the Cross-Language Evaluation Forum (CLEF) are to stimulate the development of monolingual and multilingual information retrieval systems for European languages and to contribute to the building of a research community in the multidisciplinary area of multilingual information access. These objectives are realised through the organisation of annual evaluation campaigns and workshops. The results of the fourth campaign were presented at a two-day workshop held in Trondheim, Norway, 21-22 August, immediately following the seventh European Conference on Digital Libraries (ECDL 2003).
Each year, CLEF offers a series of evaluation tracks designed to test different aspects of mono- and cross-language system performance. The intention is to encourage systems to move from monolingual text retrieval to the implementation of a full multilingual multimedia search service. CLEF 2003 offered eight tracks evaluating the performance of systems for monolingual, multilingual and domain-specific information retrieval, multilingual question answering, cross-language image and spoken document retrieval. The main CLEF multilingual corpus now contains well over 1,600,000 news documents in nine languages, including Russian. A secondary collection used to test domain-specific system performance consists of the GIRT-4 collection of English and German social science documents.
CLEF 2003 posed a number of challenges, in particular with respect to the multilingual and bilingual tracks where the aim was to encourage work on many European languages rather than just those most widely used. There were two distinct multilingual tasks; the most challenging involved retrieving relevant documents from a collection in eight languages: Dutch, English, Finnish, French, German, Italian, Spanish and Swedish, and listing the results in a single, ranked list.
Tasks offered in the bilingual track involved "unusual" language pairs: Italian - Spanish, German - Italian, French - Dutch, Finnish - German. We were very pleased by the number of groups that attempted these difficult tasks.
Another positive aspect of CLEF 2003 was the number of new tracks offered as pilot experiments. These included mono- and cross-language question answering, and cross-language image and spoken document retrieval. The aim has been to try out new ideas and develop new evaluation methodologies, suited to the emerging requirements of both system developers and users with respect to today's digital collections. This year's interactive track included full cross-language search experiments where the user attempts to find relevant documents using a complete interactive cross-language system that provides assistance in both query formulation and document selection.
Participation in CLEF 2003 was slightly up with respect to the previous year with 43 groups submitting results for one or more of the different tracks: 10 from North America; 30 from Europe, and 3 from Asia. As in previous years, participants consisted of a nice mix of new-comers and veteran groups. Another important continuing trend is the progression of many returning groups to more complex tasks, from monolingual to bilingual, from bilingual to multilingual.
More than 60 researchers and system developers from academia and industry attended the workshop in order to discuss the results of their experiments. In addition to presentations by participants in the CLEF campaign, talks
included reports on the activities of the NTCIR evaluation initiative for
Asian languages, on cross-language information retrieval work at Moscow
State University, and on the TIDES 2003 surprise language task. The final session discussed proposals for future evaluation activities within the CLEF framework.
The presentations given at the CLEF Workshops and detailed reports on the experiments of CLEF 2003 and previous years can be found on the CLEF website at <http://www.clef-campaign.org>.
Contributed by:
Robert Tansley
Researcher
Hewlett-Packard Laboratories
Cambridge, MA, USA
<robert.tansley@hp.com>
SIMILE <http://web.mit.edu/simile> is a three-year joint project, conducted by the W3C <http://www.w3.org>, Hewlett-Packard Laboratories <http://www.hpl.hp.com>, MIT Libraries, <http://libraries.mit.edu> and MIT's Computer Science and Artificial Intelligence Laboratory <http://www.csail.mit.edu>. The aim of SIMILE is to leverage and extend the DSpace institutional repository system <http://www.dspace.org> with enhanced support for arbitrary metadata schemas and instance data, primarily though the application of RDF and Semantic Web techniques.
DSpace is an institutional digital repository developed by Hewlett-Packard Laboratories and the MIT Libraries, and is currently in production at MIT and several other universities around the world. DSpace provides stable long-term storage for the digital products of MIT faculty and researchers.
SIMILE aims to enhance DSpace in the following ways:
- Allow end-users to deposit their content and accompanying metadata in a user-friendly way
- Provide effective search and retrieval across diverse metadata and schemas
- Enable programmatic access to the metadata and schemas, so that external systems may discover and utilise the data in DSpace to provide additional services, thus fulfilling the Semantic Web vision
- Allow institutions to manage and preserve the diverse metadata and schemas as they evolve over time.
Since parallel work is underway to deploy DSpace at a number of leading research libraries, we hope that our approach will lead to a powerful deployment channel through which the utility and readiness of Semantic Web tools and techniques can be compellingly demonstrated in a visible and global community.
This work will build on the work on the Semantic Web within Hewlett-Packard Laboratories <http://www.hpl.hp.com/semweb>, including the Jena Semantic Web Framework <http://jena.sourceforge.net>. This leading Java toolkit provides a programmatic environment for RDF, RDFS and OWL, including a rule-based inference, persistent storage and efficient RDF query. Jena has already achieved a leading position amongst Semantic Web developers and is in widespread use within the community.
In conducting this work, we also hope to apply and build upon prior work from MIT's Haystack project <http://haystack.lcs.mit.edu>, which seeks to bring modern information management and retrieval technologies to the average computer user in order to make computers a more compelling environment for users to interact with their information. Haystack is built upon Semantic Web technologies, and looks into the use of artificial intelligence techniques for analysing unstructured information and providing more accurate retrieval. It also deals with the modelling, management, and display of user data in more natural and useful ways. In addition, Haystack is potentially one of the services consuming data from the SIMILE-enhanced DSpace system.
The SIMILE project started in February 2003 and is scheduled to run until December 2005. Further information can be found at <http://web.mit.edu/simile>.
Contributed by:
Dave Thompson
Digital Library Research Analyst
National Library of New Zealand
Wellington, New Zealand
<dave.thompson@natlib.govt.nz>
Over the last 18 months the National Library of New Zealand has been developing a process for defining, collecting and storing preservation metadata.
The first iteration of a preservation metadata schema was released in November 2002 with a revised version made available in June 2003.
The second phase of the process has been creation of a data model for implementation. The data model was released in July 2003. It is based on the logical preservation metadata model (Revised version) and maintains the overall structure and data relationships contained there. It is intended to provide a step towards the implementation of a repository for preservation metadata. As part of this work XML schema definitions of the data model were also developed.
In parallel with this work the Library has been developing a metadata extraction tool (Java and XML based) for programmatically extracting data in XML format from the file headers of a range of file types. To date we have had five adapters built: Word 2, Word 6, TIFF, WAV and BMP. An XSLT is then run across the extracted data to create a file of preservation metadata already formatted for uploading into a preservation metadata repository.
As part of this work, XML schema definitions of the data model were also developed. All of the above documents can be accessed from <http://www.natlib.govt.nz/en/whatsnew/4initiatives.html#meta>.
The National Library of New Zealand welcomes any feedback on this work. The contact person at National Library of New Zealand is Dave Thompson, Digital Library Resource Analyst, <dave.thompson@natlib.govt.nz>.
Excerpts from Recent Press Releases and Announcements
Distributed Full Text Search of Math Books Now Available
Announced 4 September 2003, by John P. Wilkin, University of Michigan
"The university libraries of Cornell, Göttingen, and Michigan are pleased to
announce the first public availability of a significant body of mathematical
monographs with access provided through a distributed full text search
protocol. The virtual collection, comprising more than 2,000 volumes of
significant historical mathematical material (nearly 600,000 pages), resides
at the three separate institutions and is provided through interfaces to the
three entirely different software systems. Public interfaces to the
collection may be found at:
<http://www.hti.umich.edu/m/mathall>
and
<http://mathbooks.library.cornell.edu>
These two public interfaces reflect different development efforts at
Michigan and Cornell, each with their own perspective on how to best mediate
the search through the protocol, and each based on the protocol."
"The protocol for this distributed search was developed by the three
participating institutions over the last two and a half years, with generous
support provided by the National Science Foundation. Working from the roots
of the DIENST and the then-emergent OAI protocols, the project team focused
on creating a new protocoldubbed CGM, for "Cornell, Göttingen,
Michigan"that was consistent with OAI, borrowed from DIENST, and added
mechanisms for full text searching. The protocol and more project
information are available at <http://www.library.cornell.edu/mathbooks>. "
"While our testing has found that network latency and the vicissitudes of
different production environments do present challenges, our results
indicate that a distributed full text search is certainly viable. We
believe that the CGM protocol is relatively unique in providing
production-level full text access to distributed collections."
"We invite feedback on the effectiveness of the protocol from both users of
the materials and from digital library developers. Although essentially a
first prototype with significant needs for extension and refinement, we
believe that our progress to-date should be encouraging for digital library
developers interested in federating collections. And, further, the
collection itself is a rich resource for the study of mathematics history
and a number of related disciplines. The collections at Cornell and
Michigan are both fully searchable, and while the Göttingen collection
currently includes bibliographic information and page images, Göttingen is
actively seeking funding to create full text for its volumes."
"We welcome all feedback. Please send comments to <cgm-feedback@umich.edu>."
"We hope that the documentation on the protocol (currently found at
<http://www.library.cornell.edu/mathbooks/cgmverbs.xml>) will spur others to
add CGM-capability to their systems. The software created through this
NSF-funded grant will be made available in a number of ways. The API
developed by Göttingen, allowing them to provide access through the Agora
software, will soon be available to other Agora sites. The functionality
developed by Michigan will be included in release 11 of the DLXS digital
library software (September, 2003). And Cornell is exploring distribution
and support models for its electronic publishing software, DPubS, the system
also behind Project Euclid (http://ProjectEuclid.org>). If you are
interested in using the raw protocol mechanisms at Cornell, Göttingen and
Michigan in your development efforts, please contact:
Andrea Rapp, Göttingen <arapp@mail.sub.uni-goettingen.de>
David Ruddy, Cornell <dwr4@cornell.edu>
John P. Wilkin, Michigan <jpwilkin@umich.edu>"
OCLC's FRBR Work Set Algorithm
Announced 3 September 2003 by Robert Bolander, OCLC Online Computer Library Center, Inc.
"OCLC is making an algorithm available free of charge to organizations interested in converting their bibliographic databases to the Functional Requirements for Bibliographic Records (FRBR) model, in which individual records are brought together under the concept of a work. The algorithm was developed by research staffers Thom Hickey and Jenny Toves."
Related links:
"OCLC's FRBR Work-set Algorithm:
<http://www.oclc.org/research/software/frbr/index.shtm>."
"Overview of FRBR projects at OCLC Research:
<http://www.oclc.org/research/projects/frbr/>."
Dublin Core Metadata Element Set recognized by ISO; Finland becomes first national affiliate
OCLC News - 31 August 2003: "The Dublin Core Metadata Element Set (DCMES) has been approved by the International Standards Organization (ISO) as an international metadata standard."
"DCMES, also known as "Dublin Core," was developed for use on
the Web and in other information networks across a wide variety of subject
areas, languages and economic sectors. Dublin Core has been adopted by
seven national governments and translated into 30 languages. OCLC Online
Computer Library Center, Inc. serves as the primary sponsor for the Dublin
Core Metadata Initiative (DCMI), and manages its Web site. DCMI is the
maintenance agency for the Dublin Core standard and is responsible for
its development, standardization and promotion."
"'The ISO recognition serves as a seal of approval of the Dublin
Core. DCMI is pleased with the international recognition by ISO,'
said DCMI director Makx Dekkers."
"Helsinki University Library (The National Library of Finland) recently
pledged Finland's continuing support of DCMI by becoming the organization's
first national affiliate. 'We are very pleased to become the first
national affiliate of DCMI. The Dublin Core standard has been used in
Finland for years and I am confident that the number of users and implementations
will continue to grow,' said Juha Hakala, the national library's
director of Information Technology."
For further information, please see the full press release at <http://www5.oclc.org/downloads/design/pr/08312003/dcmi.htm>.
OCLC, Die Deutsche Bibliothek and Library of Congress to develop Virtual International Authority File
OCLC - 31 August 2003: "OCLC Online Computer Library Center, Inc., Die Deutsche Bibliothek (the
German national library) and the Library of Congress signed a memorandum
of understanding to develop the Virtual International Authority File (VIAF),
an effort to include authoritative names from national libraries into
one common global service."
"The goal of the VIAF project, initially launched in 1998 by Die Deutsche
Bibliothek and the Library of Congress, was to reduce cataloging costs
by providing access to authority records worldwide. The new VIAF proof
of concept project will virtually combine the personal name authority
files of the Library of Congress and Die Deutsche Bibliothek into a single
name authority service, making them available though an Open Archive Initiative
(OAI) server. For example, German users will be able to view names displayed
in the form established by Die Deutsche Bibliothek (German), while U.S.
users will be able to view names displayed in the form established by
the Library of Congress (English)."
"...OCLC will provide software to match personal name authority records between
the two authority files, which will produce initial linking for the service.
The long-term goal of the VIAF project is to include the authoritative
names from many national libraries into a common global service that should
be freely available to users worldwide via the Web. Such a service would
be an integral part of future Web infrastructures, enabling displays of
controlled names in the language and script the user needs. The first
stage of the current VIAF project, which involves matching the retrospective
files, will take about one year to complete."
For further information, please see the full press release at <http://www5.oclc.org/downloads/design/pr/08312003/viaf.htm>.
Current Awareness for Librarians
Announced 26 August 2003, by Arlene Eis, Infosources Publishing
"The Informed Librarian Online now has its own free web site at www.informedlibrarian.com>."
"The web site is the information professional's one-stop site for all of their professional reading. At the end of each month, an issue is posted linking to all of the tables of contents and full-text (where available) of the journals, magazines, newsletters and electronic publications that came out during the month. Over 275 different journals addressing all aspects of librarianship and information science are covered by this service. It is a easy, time-saving way to keep up with the professional literature. The issues are emailed to subscribers each month, with a link to the web site.
"
"In addition to presenting the current issue and archives of The Informed Librarian Online, the web site features an article, specially written for the site, on an important issue facing librarians today. The August Guest Forum, as it is called, is "Librarianship: The Invisible Profession" by Judith A. Siess. The web site also offers discounts on new books of interest to librarians that will help them in their work, and a full-text article from the current journals selected for our readers."
"Many more exciting enhancements are planned for the site. Subscriptions are free, and passwords are provided free to subscribers to access the site. To sign up, go to <http://www.informedlibrarian.com/ilofreesubscribe.cfm>
The list of journals covered in The Informed Librarian Online can be viewed at <http://www.informedlibrarian.com/ilojnltitles.cfm>
The subject collections of journals covered in The Informed Librarian Online can be viewed at <http://www.informedlibrarian.com/subjcoll.cfm>."
NISO Publishes Metadata Demystified
Guide for Publishers
"Bethesda, Md., USA (August 20, 2003) The National Information Standards Organization (NISO) announces the joint publication with The Sheridan Press of Metadata Demystified: A Guide for Publishers. The guide presents an overview of evolving metadata conventions in publishing, as well as related initiatives designed to standardize how metadata is structured and disseminated online. Focusing on strategic rather than technical considerations, Metadata Demystified offers insight into how book and journal publishers can streamline metadata operations in their companies and leverage metadata for added exposure in digital media including the Web."
"Authors Amy Brand, Frank Daly, and Barbara Meyers explain what metadata is and isn't, why metadata is important to both publishers and readers, and then discuss specific book and journal-oriented metadata practices such as ONIX, CrossRef, and the Open Archives Initiative. With metadata now an essential part of the publication process, Metadata Demystified can help both readers and publishers understand and harness the new opportunities metadata brings to information resources."
"The Metadata Demystified guide is available for free download from NISO at <
http://www.niso.org/standards/resources/Metadata_Demystified.pdf>
or can be purchased in hard copy from NISO Press for $20.00 plus shipping and handling."
For further information, please contact Patricia Harris at <pharris@niso.org>.
Elsevier, American Chemical Society to link services to improve researchers' access
Announced 19 August 2003, by Jarno Talens, Elsevier
"Amsterdam, 18 August 2003 - Elsevier and two divisions of the American Chemical Society - Chemical Abstracts Service (CAS) and Publications - have announced that they have agreed to provide linking between their services for scientists. Under the agreement:
"
- Users of Elsevier products and services (such as Science Direct®, MDL® databases and ChemWeb) will be able to link directly to ACS scientific journals
- Users of CAS products and services (SciFinder, STN®, and others) will be able to link, via ChemPort, directly to Elsevier scientific journals.
"Links between the services of these prestigious publishers will be completed prior to the end of 2003 making it possible for scientists to navigate this scientific information seamlessly no matter where they begin their research."
For more information contact either
Eric Merkel-Sobotta
Elsevier
31-20-485-2994
<e.merkelsobotta@elsevier.com>
or
Eric Shively
CAS
614-447-3847
<EShively@cas.org>
Mellon Grant Lets Libraries and Computer Scientists Join Forces to Create New Solution for Digital Information Management
11 August 2003, Charlottesville, VA - "Thanks to a generous grant from the Andrew W. Mellon Foundation, the University of Virginia Library announces the release of an open-source digital object repository management system. The Fedora Project, a joint effort of the University of Virginia and Cornell University, has now made available the first version of a system based on the Flexible Extensible Digital Object Repository Architecture, originally developed at Cornell."
"Fedora repositories can provide the foundation for a variety of information management schemes, not least among them digital library systems. At the University of Virginia, Fedora is being used to build a large-scale digital library that will soon have millions of digital resources of all media and content types. It is also currently being tested by a consortium of institutions that include the Library of Congress, Northwestern University, Tufts University, and others. They are building testbeds drawn from their own digital collections that they will use to evaluate the software and give feedback to the project."
"This first version of the software is designed to support a repository containing one million objects using freely available software. It fully implements the Fedora architecture, provides the first version of a graphical user interface to manage the repository, and provides facilities to create and ingest batches of objects."
"Fedora is being made available as an open-source product under a Mozilla Public License. For more information and to download the software, visit <http://www.fedora.info>."
For information about the major features of the software please see <http://www.fedora.info/index.shtml>.
ERPANET Announces ErpaEPRINTS at IFLA2003 Berlin
"In their 1999 report on Digital Archaeology: Rescuing Neglected and Damaged
Data Resources Ross and Gow used 199 online references.
Today, only 32% of these resources remain accessible. This lack of
persistence remains a continuing cause for concern. ERPANET, in
collaboration with Project Daedalus, announced on Wednesday 6 August at
IFLA 2003 (Berlin) an ePrints Service for the Digital
Preservation sphere to provide a platform to address this problem.
ErpaEPRINTS allows authors and creators to make their works
available at a central access point. The service can be found at
<http://eprints.erpanet.org>."
"Key to ERPANET's rationale is the focus on the dissemination of knowledge.
To date, workshops and seminars, conferences, case
studies, and an online advisory service have all contributed to widening
and expanding the knowledge-base in the this sphere. Now
ePrints, with it's sister product erpaAssessments, is offering further
support and knowledge to the community. ErpaAssessments was
launched in late 2002, and provides high quality commentaries on key
literature and projects in digital preservation. Now with the
addition of erpaEPrints, ERPANET is offering not only value added
commentaries of literature, but helping to make accessible and
preserve the cutting edge primary literature. This service is free of
charge to authors and users."
"At the outset of its project ERPANET established a long term preservation
strategy to ensure documents created by or accumulated by
the project would be available in the future. Materials deposited in the
ErpaEPRINTS Archive will benefit from these arrangements.
"
"Seamus Ross, Principal Director of ERPANET, reported 'ERPANET is extremely
grateful to the DAEDALUS project for enabling us to
produce a new service that provides the digital preservation community with
the opportunity to build a digital library of eprints
and ensure their long term accessibility.'"
"For more information about ERPANET events and
services see <http://www.erpanet.org>."
Richard A. Detweiler Named CLIR Interim President
July 30, 2003: "The Board of the Council on Library and Information
Resources (CLIR) is pleased to announce the appointment of Richard A.
Detweiler as its interim president, effective August 1. Mr. Detweiler served
as president and professor of psychology at Hartwick College in Oneonta, New
York, from 1992 through June 2003. Earlier this year he was named a
distinguished fellow by CLIR to develop new approaches to enhancing the
effectiveness and efficiency of liberal education. He has also been involved
in other CLIR projects, including serving as co-dean, with Deanna Marcum, of
the Frye Leadership Institute since its inception in 2000. The Frye
Institute educates administrators and faculty about the broad range of
issues affecting the future of higher education."
"A social psychologist by training, Mr. Detweiler was appointed to the
faculty of Drew University in 1973. He subsequently moved into
administrative roles, ultimately as a vice president with responsibilities
for planning and information technology. He has long been committed to the
effective use of information resources for teaching, learning, and
scholarship. Twenty years ago he was responsible for Drew becoming the first
liberal arts institution to provide computers for every student. He carried
this interest over to his presidency at Hartwick College, where ubiquitous
computing and networking was implemented a decade ago."
"...Deanna Marcum, outgoing president of CLIR, commented: 'It is gratifying to
know that Rick Detweiler will assume leadership responsibility for CLIR. He
has been central to the success of the Frye Leadership Institute, and his
distinguished fellowship this year goes to the heart of CLIR's agenda:
connecting information resources and services with the academic enterprise.
I applaud the Board's decision.'"
For further information, please see the full press release at <http://www.clir.org/pubs/press/2003detweiler.html>.
International recognition for Waikato Professor [Ian Witten]
"Computer Science Professor Ian Witten has been honoured with a prestigious international award for the humanitarian work of his University of Waikato research group, the New Zealand Digital Library project, on the Greenstone Software."
"Professor Witten will be the seventh recipient of the biennial Namur award (http://www.info.fundp.ac.be/~jbl/IFIP/award.html), which recognises recipients for raising awareness internationally of the social implications of information and communication technologies."
"Presented by the International Federation of Information Processing, an organisation devoted to the relationship between Computers and Society, the purpose of the Namur award is to draw attention to the need for a holistic approach in the use of information technology in which the social implications have been taken into account."
"...Professor Witten was nominated for his work with the New Zealand Digital Library (NZDL) project, a research group within the University of Waikato Computer Science Department. One focus of the group is developing the Greenstone Software, which is being used to deliver humanitarian and related information in developing countries (http://www.greenstone.org/english/home.html)."
For further information, or to request the full press release, please contact Maree Talmage, School of Computing and Mathematical Sciences at the University of Waikato, <mtalmage@scms.waikato.ac.nz>.
(On September 17, 2003, links to the European Conference on Digital Libraries (ECDL 2003) were added to the each of the five ECDL workshop reports in this column.)
Copyright 2003 © Corporation for National Research Initiatives
Top |
Contents
Search | Author Index
| Title Index | Back Issues
ECDL 2003 Conference Report | Book Review
Journal Review | Clips & Pointers
E-mail the Editor
DOI: 10.1045/september2003-inbrief
|