D-Lib Magazine
|
|
Susanne Dobratz |
The Open Archives Forum ProjectThe Open Archives Forum is a two-year accompanying measures project that started in October 2001, under the Information Society Technologies-Programme of the European Commissions 5th Framework (IST-2001-320015). Project partners include: UKOLN- University of Bath (United Kingdom), Istituto di Scienza e Tecnologie della Informazione - CNR (Italy) and the Computing Center of Humboldt University (Germany). UKOLN is the coordinator of the project.
The Open Archives Forum is not another OAI implementation project. It is a clustering activity that targets existing open archives communities, as well as new communities, like IST projects or national initiatives planning or initiating open archives. The Open Archives Forum is a dissemination activity that aims to manage an exchange of experiences on open archives in general. The project investigates usage of open archives under different paradigms, and its aims are to make digital repositories more widely available, make them globally accessible, encourage people to share developments, and enable developing countries to obtain access to scientific and cultural heritage information. The Open Archives Forum project supports established metadata repositories and supports new open archive data providers from communities such as cultural heritage institutions, museums, European digitization projects, research organizations, educational institutions, public libraries, community organizations and publishers as well as the commercial sector. Benefits of a "Forum"The Open Archives Forum serves as a tool to reach communities and mirrors especially to inform them about what is happening with regard to metadata implementations in Europe. The creation of services like Open Archives Initiative [1] service providers, aggregator services or other value-added services need to be supported. The Forum's additional responsibility is to raise awareness about and spur discussion on major open archive issues, e.g., to agree on a common terminology on digital repositories, to look into metadata and full text harvesting models, and to analyze and transfer user and community needs to potential repository service developers in order to support the building of advanced services. The primary focus of the Open Archives Forum is to prepare (European) projects for action, so they will develop solutions to potential problems and establish new business models within digital libraries. The Open Archives Forum project both disseminates information and encourages collaborative development of software. It furthermore supports European liaison with the OAI, especially through open exchange of information, and in that way helps build a community of interest. Summarizing the project objectives: they are to create a forum for exchanging experiences about all aspects of the Open Archive approach, including workshops, information, and organizational validation of existing technologies and strategies, as well as the technical validation of tools, software and interfaces. How the Open Archives Forum worksThe Open Archives Forum project is split into several "workpackages" including:
Workshops and Dissemination workpackageThe Open Archives Forum uses the Workshops and Dissemination workpackage to create interest within new communities. Four workshops were planned, and two have already taken place: the first in May 2002 in Pisa, Italy, and the second in December 2002 in Lisbon, Portugal. By orienting workshops toward different communities, the Open Archives Forum hopes to raise awareness in communities that have not been as active as the e-prints community has. These new, targeted communities could learn from the experiences of the e-prints community and other communities already providing open archives, and they could also reuse existing tools and software. To support this goal and prepare for the first workshop, a survey was conducted prior to the workshop. (The results of the survey will be discussed in detail later in this article.) The December workshop in Lisbon was targeted to traditional libraries and archives. The theme of the workshop was "Providing Access to Hidden Resources", and the workshop goal was to explore whether and under what conditions the open archive approach is strategically relevant and technically viable for these two communities. In various presentations and breakout sessions, requirements, standards, best practices, and solutions to interoperability problems of the traditional archival and library communities were analyzed and compared with the features provided by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The intention was to share experiences in preparing the conditions for a wider availability of the resources now hidden in European libraries and archives. (You can find the detailed workshop program as well as abstracts and presentation slides at <http://www.oaforum.org/workshops/lisb_programme.php>.) The communities targeted for the third workshop, which will be held in March 2003 in Berlin, Germany, are those who are conducting multimedia projects, as well as cultural heritage institutions and museums. The Open Archives Forum will bring these communities together to discuss interoperability and open access issues. The production of different media types, such as videos, audios, images, animation, etc., as genuine digital productions or as digitized images has reached the stage where the multimedia content needs to be stored and managed within digital libraries. The aim of the Berlin workshop is to discover and explore the specific requirements and demands that need to be addressed before opening the media archives via the Internet. The Berlin workshop will consist of presentations given by invited speakers and small group breakout sessions where the participants can discuss key issues. A pre-conference tutorial on OAI-PMH implementation will be held for those not familiar with this protocol. In addition, a representative of the Open Archives Initiative (OAI) will be one of the workshop speakers and will provide an update of OAI activities. (In late January, further information about the Berlin workshop, as well as a registration form, will be available at <http://www.oaforum.org/workshops/berl_invitation.php>.) The fourth workshop, planned for September 2003 in Bath, UK, is envisioned as a big dissemination event where all communities can come together, and the workshop will be held with the close cooperation of other initiatives, like SPARC and LIBER. Organisational Validation workpackageThe Organisational Validation workpackage explores old and new business models that may be used within an open repository and service environment. Topics found in this workpackage include: the cooperation of Data Providers needed to build networks of Service Providers; the terms and conditions of a metadata 'exchange' between Data and Service Providers; and the provision of added value services (metadata enhancement, auto classification, addition of OpenURLs). Within this workpackage, issues of ownership as well as the digital rights management questions are discussed. For instance, how are Intellectual Property Rights (IPR) and copyright handled within digital repositories and services, and what impact does the open archives development have on authors and publishers? How can metadata be shared and exchanged, and what commercial agreements exist? Another important topic is that of the long-term sustainability of digital resources, metadata and scholarly communication. To deal with these issues the Open Archives Forum established an email discussion group maintained by Dennis Nicholson (Strathclyde Univ. Glasgow). Technical Validation workpackageThe Technical Validation workpackage examines the deployment of the OAI technical framework and creates an information space in the form of a web-based database on projects, software, implementations, and services, that allows interested parties to search for potential project partners, for metadata standards and for information about interoperability issues as well as to share project developments. Furthermore this workpackage provides information from those projects that have already dealt with integration of existing technologies. For example, here one can find whether projects have determined unqualified Dublin Core to be sufficiently rich for their purposes, or how the handling of database management issues (concurrency and update, scalability, re-duplication) are investigated, as well as find information regarding software, tools and the amount of effort needed for OAI implementation in terms of manpower, skills and time to set up a repository or service. Overview of European Activities on OAI in relation to worldwide activitiesThe Open Archives Forum project conducted a survey on the European activities taking part in OAI. To see the relationship between European activities to worldwide activities, the following resources [2] were used:
If one considers origin, current advancement and concomitantly longer experiences with OAI (at least as shown in Figure 2), the slight numerical predominance of European activity is surprising. However, if one looks at the more detailed country list shown in Figure 3, it is noticeable that within Europe there is an unequal involvement among countries already engaged in OAI activities. (The Figure 2 and Figure 3 charts do not consider projects in development.)
To obtain an overview of what software was used to become an OAI compatible repository, the data from the OAI repositories were examined. The predominant software used by OAI repositories was self developed. (See Figure 4.)
Especially in Europe, many implementers used the eprints software from the University of Southampton (Data Providers and Service Providers). Other software tools like ETD-db, CDS, IMDI and arXiv were only mentioned once. Although the Open Archives Initiative deadline for upgrading to OAI-PMH 2.0 was 1 December 2002, survey results showed that more than half of the repositories were still using OAI-PMH Version 1.1 (up to the end of November 2002) and had not changed over to Version 2.0. (See Figure 5.)
Overview of European Open Archives Activities based on OAI implementationsIn this article, we basically distinguish between different types of Data Providers: subject gateways, institutional repositories, library online public access catalogues (OPACs), commercial publishers and vendors, and media archives and museums. Service Providers offer either search services or extended services. The following section gives a partial overview of existing Data and Service Providers in Europe today. Two examples of extended services [3]
Exemplary representation of different Data Provider typesWe found OAI compliant repositories in: Austria, Belgium, Denmark, Finland, France, Germany, Italy, Ireland, Portugal, Norway, The Netherlands, Russia, Sweden, Switzerland and the United Kingdom. Many institutional archives are planned or are being initiated. Some funded projects include Service Providers. In particular, the University of Southampton has projects with international impact. You can find detailed lists of OAI projects on some of the slides presented at the 2nd OA-Forum workshop in Lisbon [4]. Examples of European Data Providers: subject gateways [5]
Examples of European Data Providers: institutional repositories (by the example of Germany)
Technical Validation QuestionnaireThe reasons behind the Open Archives Forum Technical Validation QuestionnaireThe Open Archives Forum started a first Technical Validation Questionnaire [11] in preparation for the first Forum workshop in Pisa. The objective of the questionnaire was to provide an overview on the status, experiences and future plans regarding the workshop participants' OAI implementations. This first questionnaire was given exclusively to Pisa workshop participants. Great interest was generated in Pisa on the results of this small survey, and the Open Archives Forum project received feedback indicating that it would be a good idea to collect experiences from a broader spectrum of OAI implementers as well as to learn more about starting conditions of those planning to implement or ones just beginning. The focus of interest was on fundamental questions such as:
Based on the need for answers to the above, in the second questionnaire we added or changed some questions and extended the period for responding to the questionnaire. In addition, we expanded the target audience for the questionnaire and subdivided the form to account for those projects that have not yet integrated OAI-PMH in addition to those who are experienced implementers. This second, long-term survey will continue through autumn 2003. Summary of early results from the second Technical Validation Questionnaire [12]In the second Technical Validation Questionnaire we are asking for information about software used, implementation costs, coverage of the archive, and interoperability, experiences and expectations in different communities and in different countries. Who has participated to date?To date, 33 repositories have participated in the survey. Eleven of the repositories are not yet OAI implementers, but they are considering becoming implementers. (See Figure 6.)
The responding repositories are distributed throughout Europe, with only two participants from overseas. More than a third of the survey respondents were from Germany or the UK. Up to now, clearly more Data Providers than Service Providers have completed their OAI implementations. If one views the number of implementations under development or being planned, we will soon have many new services available. Many Data Providers used their implementation experiences to guide them in becoming Service Providers. (See Figures 7 and 8.)
If we look at the types of communities represented in the responses, it is remarkable that nearly half of the responders came from Libraries or Archives. (See Figure 9.)
Software usedAs mentioned previously, the first block of the survey is made up of questions about technical infrastructure and software solutions. Prior to OAI implementation, the dominant programming languages used by responders were PERL, Java and PHP, and the dominant databases were MySQL and Oracle. Practically no statements were made to interface and collection systems, so it is not possible to provide relevant information from the survey about those. However, it is significant that almost none of the organizations needed to replace existing software tools in order to become OAI compatible. About 70% of the tools used to become OAI compatible were self-developed by Data Providers and Service Providers. Most of the Data Providers and Service Providers make their tools and source code available for others to use. The programming languages used to develop these tools are mainly Java and PERL, and also used frequently are PHP and XML. A few implementers used the eprints software, which is for both Data and Service Providers. The eprints system is run in a centralized way although archives are distributed. Other tools like PERL implementations or DBUnion were mentioned only once by survey respondents. Implementation costsAfter the questions regarding the software used, the next questionnaire subject block concerns implementation costs. With regard to the implementation skills needed, Data Providers as well as Service Providers focused on various combinations of the following five competencies:
The survey results showed that most implementations were concluded within one quarter (three months) and most implementations were managed by one programmer. The reason for bigger expenditures by a few of the implementers was not directly connected to the implementation of the OAI-specifications. The higher costs involved larger research projects or were due to construction of archives or the processing of greater amounts of data. When survey respondents estimated maintenance costs, these were limited to at most 5 person days, and most often were estimated to be one person day per month for stable protocol. These statements of expenditures were in line with the expectations of those who have not yet become OAI implementers: i.e., implementations concluded within one-quarter year and by one programmer. However, expectations on further maintenance for a stable protocol are higher; they were expected to be up to 20 days per month for one person. With the other survey questions, the answers when compared to expectations regarding implementation costs, differ too much for trends to be recognizable. This includes expectations regarding easy integration of the data structures suggested by the OAI-PMH in existing infrastructure, the costs of adapting data to the OAI-PMH, and expenditure needed for data preparation for Internet usage. Resources offered and issues of interoperabilityThe next block on the survey questionnaire regards the range and kind of resources offered by the archive as well as interoperability. Data ProviderThe range of the number of resources available from Data Providers includes a wide span of between 35 and several million documents. The occupied storage space ranged from between 15 megabytes to 2 terabytes. Looking at both these ranges, it is important to note that the storage capacity used has less to do with the amount of data than with the type of objects. In the list of the object types offered, it strikes one again that full text documents and metadata are what is mainly offered. The reason is due not only to longer experience with storing and evaluating data based on text. Of bigger concern is the cost of storing pictures and video files, which need stable, efficient databases. (See Figure 10.)
The range of content types includes essentially the entire spectrum of scientific publications. There is a notably high interest in preprints, journal articles and theses. This provides evidence of a big need for a reasonable, fast way to access scientific information beyond conventional scientific publication forms. Other resources offered include library catalogues or video streams of university events. (See Figure 11.)
The most-applied metadata format is Dublin Core. In addition, according to the respondents, library-specific formats are used, like MARC 21. However, there are a remarkably high number of formats that are mentioned only once, such as Dublin Core Library Profile, DiTeD, CEOS CIP, AMF, RIS, MAB, SPECTRUM, TEI, and one internal format. (See Figure 12.)
Approximately half of the Data Providers are offering full text or extracts of documents. If the openness of the interface must be reduced, there are two access-limitation strategies: On the one hand, access control like control of the IP-addresses or licensing can be used, and on the other hand, data output is limited. Service ProviderService Providers offer local or community specific services to search and browse for information. Some provide a workspace for managing documents and metadata, and for collaboration within groups of users. A number of survey answers referred to research projects. For Service Providers, paging functions are urgent. They search in one or several sources through one search interface. Other services offered include cross-linking and annotation. Strategies to process data harvested from data providers include using no provenance information or filtering harvester output and loading the local database. If Service Providers include information about Data Providers in data output, they do so in three ways:
Experiences and expectationsData ProviderFor Data Providers, the importance of the OAI technical framework is that it makes it possible to provide additional services to existing services, replace existing services through the OAI interface and offer better retrieval. The advantages of OAI are to share scientific knowledge and to harvest other knowledge databases. OAI also enables the importation of metadata into library software and major dissemination of the results of research. The OAI implementation is simple, cheap and easy to adapt for internal project usage. Last but not least, in comparison to more complex protocols, it is a simple-to-implement facility for exchanging metadata. Service ProviderConcerning the experiences of Service Providers, some survey respondents indicated that standardization presented a problem: The heterogeneity of metadata record content requires the Service Providers to expend a lot of effort to normalize the data to make it usable. The Service Providers believe that metadata normalization can be done less expensively by Data Providers. A possible solution to this problem might be the development of middleware tools that Service Providers could use for data normalization. In addition to listing those problems, the Service Providers who responded to the survey stated that they have future plans to do the following:
One library is creating a single catalogue of all its library catalogues: library OPAC, archives database, image database and Internet gateways. Information sourcesAnother of the survey questions focused on the quality of information sources. Many of the respondents who are not yet OAI-implementers say it takes too much effort to find good information about metadata, and especially difficult to find technical support. Some asked for an easy introduction to OAI-PMH. Other participants recommended the following ways to find good information:
Help us to create an Information ResourceInformation Resource DatabaseThe Open Archives Forum project has set up a European authority registry for open archives, (not restricted to OAI-compatible archives) that provides additional information about the content of those archives. The registry supports collaboration and dissemination by making available information about OAI-compliant Data and Service Providers, both EC-funded initiatives and other national initiatives in Europe. Users of the registry will fill the database themselves. For each provider, the database includes details regarding scope of project, content, collection development policy, metadata formats, version of OAI used or other protocol implemented, contact names, and tools in use. The database uses a self-registering interface with a priority on sustainability and automated input techniques. The database provides information on:
Contributions are soughtWe welcome those with experience and knowledge of open archive implementation to contribute to the growing database for the benefit of the larger community of implementers. Please register at the Information Resource Database: <http://www.oaforum.org/oaf_db/index.php>. and respond to the Technical Validation Questionnaire: <http://www.oaforum.org/resources/tecvalq2.php>. Notes and References[1] Open Archives Initiative: <http://www.openarchives.org>.
[2] Open Archives Initiative: <http://www.openarchives.org/service/listproviders.html>, [3] Extended services: CYCLADES: <http://www.ercim.org/cyclades> [4] Overview of European Activities on OAI - Slides presented at the 2nd OA-Forum workshop in Lisbon: <http://www.oaforum.org/otherfiles/lisb_overview.ppt>. [5] Subject Gateways: Behavioral and Brain Science Prints Interactive Archive: <http://www.bbsonline.org> [6] University library document server: DuetT (Univ. Duisburg): <http://ub-www2.uni-duisburg.de> [7] Media server of universities: timms (University Tübingen): <http://timms.uni-tuebingen.de>. [8] University library catalogue: Univ. Library Oldenburg: <http://bis-oai.bis.uni-oldenburg.de>. [9] Library service institution for the region: BSZ-BW (Bibliotheksservice-Zentrum Baden-Württemberg): <http://www.bsz-bw.de>. [10] Spreading initiative: DINI - German Initiative for Networked Information: <http://www.dini.de>. [11] 1st Technical Validation Questionnaire: <http://www.oaforum.org/resources/tecvalquest1.php>. [12] Summary of some first results of the 2nd Technical Validation Questionnaire - Slides presented at the 2nd OA-Forum workshop in Lisbon: <http://www.oaforum.org/otherfiles/lisb_tvq.ppt>. [13] Web information sources: <http://www.openarchives.org>, <http://www.ndltd.org>, <http://www.cimi.org>, <http://www.eprints.org/>, <http://www.rlg.org/>, <http://www.oaforum.org/>, <http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/faq/oai>. [14] Online journals: Ariadne: <http://www.ariadne.ac.uk/ / D-Lib Magazine: <doi:10.1045/dlib.magazine>. [15] Test programme: <http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai>. Copyright © Susanne Dobratz and Birgit Matthaei |
|
|
|
Top | Contents | |
| |
D-Lib Magazine Access Terms and Conditions DOI: 10.1045/january2003-dobratz
|