Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
March 2003

Volume 9 Number 3

ISSN 1082-9873

Open Archives and UK Institutions

An Overview

 

Stephen Pinfield
Information Services
University of Nottingham, United Kingdom
<Stephen.Pinfield@Nottingham.ac.uk>

Red Line

spacer

Introduction

This is a pivotal stage in the development of institutional open archives. We are moving from a position where awareness of the issues surrounding self-archiving was restricted to a relatively small number of enthusiasts to a position where it is entering the consciousness of a large number of practice-based information professionals and some faculty. The next two-to-three years may be crucial in determining long-term take-up. This paper provides a brief overview of current activity in the development of open archives (particularly e-print repositories) within UK universities and similar institutions and discusses some of the issues the open archives activity is raising.

One initial issue that has arisen as the idea of self-archiving has begun to take root is the problem of ambiguous terminology. There is considerable confusion, particularly amongst practitioners who are new to the field, about the vocabulary. This is not helped by the fact that in discussion lists and, even in the literature, ambiguities surface. For example, the term "open archives" (used in the title of this article) is ambiguous. "Open" can be used to in a least two different ways. First, it can be used to convey the idea of unrestricted availability, "open access" (as in the "Budapest Open Access Initiative"). Secondly, it can be used in a more technical way to convey the idea of systemic interoperability (as in the "Open Archives Initiative"). Whilst many proponents of OAI are also advocates of open access, it should be recognized that the two do not necessarily have to go together. It is possible to use the OAI protocol in a closed-access environment and it is possible to have open access without the OAI.

The term "archive" in "open archive" is also ambiguous. To some, "archive" means little more than a database or repository of some kind. The term "self-archiving" is used in a very general sense to mean simply mounting a paper on the web. However, for many others, including most professional archivists, the term "archive" implies a considerable degree of curation and preservation. The "Open Archival Information System", for example, carries these ideas with it. Because of the potential ambiguity of the term "archive", it is noticeable that in the last year or so the more neutral term "repository" has begun to be preferred by many.

Within the terms discussed, this article concentrates on institutional open-access OAI-compliant e-print repositories in UK universities. Repositories are defined here as "institutional" where they are managed by a single institution and contain content created by the members of that institution (in the case of universities, staff and students). "E-prints" are defined in a broad way as electronic versions of research papers or other similar output. These might include pre-prints (pre-refereed papers), post-prints (post refereed papers), conference papers, and so on.

Pre-2002 activity

Up until 2002, OAI activity in the UK was concentrated in a number of areas. Firstly, there were the high-profile centralized subject-based repositories, including CogPrints and RePEc [1]. An early mirror for arXiv.org was also set up in the UK. Secondly, there was some use of the OAI protocol in a "closed" environment. A good example of this is the use made of the OAI protocol by the Resource Discovery Network (RDN) [2]. The RDN Centre harvests metadata from RDN hubs (that produce descriptions of quality Internet resources for various subject areas) in order to create a public search facility of the contents of all of the hubs [3]. Thirdly, there was some pre-2002 work in the development of experimental institutional repositories. The universities of Glasgow and Nottingham registered small-scale pilot repositories with the OAI in 2001, and other institutions were carrying out useful work behind scenes.

During 2001 and 2002, understanding grew amongst information managers that the ostensible case for institutional repositories was strong [4]. An increasing number of people acknowledged that institutional repositories had the potential to encourage the practice of self-archiving in a wide range of disciplines. It was being recognized that university institutions are places where repositories can be nurtured by subsidizing their start-up and providing robust infrastructures for their ongoing support. The case was strong, but it was recognized that these ideas needed testing in real-world situations.

Early lessons

By 2002, early experimentation in the area of institutional repositories had resulted in a number of lessons that are now providing pointers for further work.

Lesson 1: This is not a technical issue.

The OAI protocol itself has now (with version 2) reached a level of maturity and stability such that it can be implemented with confidence. OAI-compliant repository software has also been developed making implementation a low barrier activity. By 2002, the use of eprints.org software [5] within a number of UK institutions was raising the profile of OAI and its potential amongst practitioners. A community of implementers has since developed, and it is noticeable that the producers of the software at the University of Southampton have made efforts to become responsive to the needs of this wider community. More recently, other repository software has been released. This includes the DSpace code produced by MIT and the document server software written by CERN [6]. It will be interesting to see how easily these can be implemented by independent institutions.

Lesson 2: Some important collection management issues need investigating.

Early work in this area has highlighted a number of key questions facing institutional repository implementers:

  • Document type: should the institutional repositories include pre-prints, and/or post-prints, and other objects?
  • Document format: which formats should be accepted: HTML, PDF, postscript, or others?
  • Digital preservation policies: what should be preserved and how?
  • Submission procedures: how should files be formatted and then deposited, and by whom?
  • Intellectual Property Rights (IPR) policies: what are the rights of the author, institution and publisher, and how should these be identified and protected?
  • Metadata quality standards: who creates metadata and according to what standards and quality thresholds?

All of these questions need to be addressed by institutional implementers. They are not always easy, but at least we are now beginning to get a clearer idea of what the questions are.

Lesson 3: The economics of the institutional repositories require further exploration.

Whilst it has been suggested that institutions may be able to subsidize archive start-up, the economics of maintaining repositories in the medium and long term require further consideration. This is, of course, part of a bigger question of the economics of open access, an area where more work certainly needs to be done [7]. One of the key issues here is the relationship with the commercial publishers.

Lesson 4: The biggest challenge is getting content.

Setting up an institutional repository and designing collection management policies are relatively straightforward; populating the repository is not. The content of institutional repositories needs to come largely from researchers within the institution, and persuading them to submit this content is a major challenge. Self-archiving requires a cultural change amongst researchers that can only be achieved through significant advocacy activity, and even then it will probably happen only gradually. Any advocacy activity needs to try a number of different approaches and needs to be sensitive to discipline-specific issues. The incentives for researchers to self-archive need to be clearly identified [8]. Also, a keen awareness of the major barriers (or perceived barriers) for researchers needs to be developed by those who are advocating open archives. It is only then they can be effectively addressed. In addition, advocates need to ensure that their attempts to persuade colleagues of the advantages of open archives should be accompanied by new services to enable those colleagues to self-archive more easily. Examples of such enabling services might be assisting researchers with copyright issues, and self-archiving by proxy.

Although this is the most challenging aspect of setting up an institutional repository, it is also the most important. There is no point, for example, in having a collection development policy if you have no collections! This means that most effort should now be put into actually populating repositories.

JISC FAIR programme: introduction

It is within this context that the Joint Information Systems Committee (a national agency providing IT infrastructure and information services for the UK higher education and further education community) set up the FAIR programme (Focus on Access to Institutional Resources) [9]. FAIR is a development programme that aims to "support access to and sharing of institutional content within Higher Education (HE) and Further Education (FE), and to allow intelligence to be gathered about the technical, organizational and cultural challenges of these processes" [10]. Information resources produced within these organizations are regarded as key "institutional assets" that, in many cases, are currently hidden from most potential users, particularly those outside the institution where they originated. FAIR is designed "to support the disclosure of institutional assets". The JISC document that introduced the programme and called for project proposals clearly stated, "this programme is inspired by the vision of the Open Archives Initiative (OAI), that digital resources can be shared between organizations based on a simple mechanism allowing metadata about these resources to be harvested into services."

The programme consists of 14 projects that have been grouped into five "clusters": museums and images, e-prints, e-theses, IPR, and institutional portals (see Appendix). Projects began in the second half of 2002 and will last between one and three years. The total cost is about $4.7 million [11].

As the cluster names imply, FAIR is not just about e-prints. Institutional content of various kinds (e-prints, e-theses, e-learning materials, etc.) are being taken on board by a number of projects. For example, HaIRST, led by the University of Strathclyde, and DAEDALUS, led by the University of Glasgow, are looking at setting up repositories to house a wide variety of institutional assets [12]. In addition, several projects are dealing with e-theses and other non-textual digital objects. These projects, like most of those in FAIR, have put populating the repository at the core of their objectives.

The TARDis project at the University of Southampton is dealing in particular with e-prints, and the project aims to develop a working model of a multi-disciplinary institutional e-print repository [13]. TARDis is tackling cultural issues head-on. It is investigating ways in which the cultural and academic (as well as technical) barriers to institutional repositories can be overcome.

Unique amongst FAIR projects is RoMEO [14]. This project based at Loughborough University is investigating rights issues associated with e-prints. It is developing guidelines for adding rights information to OAI-compliant metadata.

SHERPA and ePrints UK

The University of Nottingham is involved in two FAIR projects, SHERPA and ePrints UK [15]. SHERPA is an OAI Data Provider project and ePrints UK a Service Provider project [16]. It may be useful to discuss these in a little more detail to illustrate further some of the issues being tackled within FAIR.

SHERPA (Securing a Hybrid Environment for Research Preservation and Access) has been initiated and is in part funded by the Consortium of University Research Libraries (CURL) [17]. The project is led by the University of Nottingham and includes other major research universities (Edinburgh, Glasgow, Leeds, Oxford, Sheffield, and York) plus the British Library and the Arts and Humanities Data Service (AHDS) as its partners. It is a three-year project running from November 2002 that aims to set up a series of institutional OAI-compliant e-print repositories (using e-prints.org software) in the partner institutions. The project will investigate key issues in populating and maintaining e-print collections. It is concentrating on advocacy to the research community in order to get content. Like all FAIR projects, it has undertaken to disseminate learning outcomes from this activity as widely as possible. Unlike other FAIR projects, it also aims to investigate the digital preservation of e-prints. In partnership with AHDS, SHERPA is investigating OAIS-compliant preservation of content [18].

SHERPA, like most of the other FAIR projects, will be working with ePrints UK, which is the major JISC-funded Service Provider project. ePrints UK is led by the Resource Discovery Network Centre at King's College London and UKOLN (UK Office for Library and Information Networking) at the University of Bath. Its other partners include OCLC, and several universities: Southampton, Leeds, Bristol, Heriot Watt, Birmingham, Manchester Metropolitan, Oxford, Nottingham, and the Universtiy of Manchester Insitute of Science and Technology. It is a two-year project that began in the summer of 2002.

The ePrints UK project aims to establish itself as a Service Provider, gathering metadata from institutional, disciplinary, and personal Data Providers. It will provide search interfaces to the data using the existing and service infrastructure of the RDN hubs. In addition, it aims to enhance metadata harvested from hubs using web services. This enhancement includes automatic subject classification and authority headings (both using OCLC software) and also includes citation analysis resulting in OpenURL citations (using University of Southampton software). The architecture of the project is illustrated in Figure 1.

chart showing ePrints UK architecture

Figure 1. ePrints UK high-level architecture (reproduced with permission).

This idea of enhanced metadata at the Service Provider level is particularly interesting. It has often been assumed that metadata quality needs to be the province of OAI Data Providers. Data Providers, where possible, need to use controlled vocabulary and put in place rigorous metadata quality standards in order that their data can be properly used by Service Providers. However, ePrints UK is investigating the possibility of the Service Provider creating additional normalized metadata not originally present in the record. The possibility of returning enhanced metadata to Data Providers is also being investigated.

Conclusions

The JISC FAIR programme in the UK includes some interesting and important projects aiming, at least in part, to address some of the key issues that have emerged from early work on open archives. But FAIR is, of course, only one current programme in this area. The Mellon Foundation OAI programme in the USA and DARE project in the Netherlands are examples of similar activities in other countries [19]. Individual institutions are also doing interesting work without external funding. As all of these programmes are now underway, perhaps more could be done to ensure that the projects communicate with each other so their activities are complementary and mutually supportive. It can be hoped that between them these kinds of projects can take on board the lessons of the early open archive work and begin to encourage the implementation of OAI-compliant repositories on a scale not previously achieved. An increasing number of information professionals and researchers are beginning to recognize that the various OAI programmes and activities have the potential to make a real difference to the scholarly communication process and therefore bring enormous benefits to the scholarly community.

Acknowledgements

This paper is based on a conference presentation given at the OAI Conference held at CERN, Geneva in October 2002. Thanks to the organizers for giving me the opportunity to contribute to the conference. Thanks also to the Open Society Institute for funding the travel expenses of speakers.

Thanks to Ruth Martin and Andy Powell for their permission to use the ePrints UK architecture graphic. Thanks to John MacColl for assistance in compiling the Appendix.

Appendix: FAIR projects

Museums and Images Cluster (4 projects)

  • Accessing the Virtual Museum: Petrie Museum, University College London
  • BioBank: ILRT, University of Bristol; University of Cambridge
  • Harvesting the Fitzwilliam: Fitzwilliam Museum, University of Cambridge; Archaeology Data Service, University of York
  • Partial Deposit: AHDS Executive, King's College London; Theatre Museum, V&A; Courtald Institute of Art, University of London; Visual Arts Data Service, University of Surrey; Performing Arts Data Service, University of Glasgow

E-Prints Cluster (4 projects)

  • ePrints UK: RDN, King's College London; University of Southampton; UKOLN, University of Bath; UMIST; University of Bath; University of Strathclyde; University of Leeds; ILRT, University of Bristol; Heriot Watt University; University of Birmingham; Manchester Metropolitan University; University of Oxford; University of Nottingham; OCLC
  • HaIRST (Harvesting Institutional Resources in Scotland Testbed): University of Strathclyde; University of St. Andrews; Napier University; Glasgow Colleges Group
  • SHERPA (Securing a Hybrid Environment for Research Preservation and Access): CURL; University of Nottingham; University of Edinburgh; University of Glasgow; Universities of Leeds, Sheffield and York ("White Rose" partnership); University of Oxford; British Library; AHDS
  • TARDis (Targeting Academic Research for Deposit and disclosure): University of Southampton

E-Theses Cluster (3 projects)

  • DAEDALUS (Data providers for Academic E-content and the Disclosure of Assets for Learning, Understanding and Scholarship): University of Glasgow
  • Electronic Theses: Robert Gordon University; University of Aberdeen; Cranfield University; University of London; British Library
  • Theses Alive!: University of Edinburgh

Intellectual Property Rights Cluster (1 project)

  • RoMEO (Rights MEtadata for Open archiving): Loughborough University; Birkbeck College, University of London; University of Greenwich; University of Southampton

Institutional Portals Cluster (2 projects)

  • FAIR Enough: Norton Radstock College, Bristol; City of Bath College; City of Bristol College; Filton College, Bristol; Weston College, Weston-super-Mare; Western College Consortium, Bristol
  • PORTAL (Presenting natiOnal Resources To Audiences Locally): University of Hull; RDN, King's College London; UKOLN, University of Bath

Notes

[1] See <http://cogprints.ecs.soton.ac.uk/>, and <http://repec.org/>.

[2] See <http://www.rdn.ac.uk/>.

[3] Described in Pete Cliff "Building ResourceFinder". Ariadne, 30, December 2001. Available at: <http://www.ariadne.ac.uk/issue30/rdn-oai/>.

[4] This understanding is still of course growing. Peter Suber's recent paper directed at library managers is a significant recent contribution in this area. Peter Suber "Removing the barriers to research: an introduction to open access for librarians". College & Research Libraries News, 64 (February 2003) pp. 92-94. E-print available at: <http://www.earlham.edu/~peters/writing/acrl.htm>. Perhaps the best introduction to self-archiving is still Stevan Harnad, "For whom the gate tolls? How and why to free the refereed research literature online through author/institution self-archiving, now". 2001. Available at: <http://www.cogsci.soton.ac.uk/~harnad/Tp/resolution.htm>.

[5] See <http://www.eprints.org>.

[6] See <http://www.dspace.org/> and <http://cdsware.cern.ch/>.

[7] OSI and SPARC have recently published guides to creating a business case for open access journals, available at: <http://www.soros.org/openaccess/oajguides/index.shtml>.

[8] See, for example, Richard K. Johnson "Institutional repositories: partnering with faculty to enhance scholarly communication". D-Lib Magazine 8, 11, November 2002. Available at: <http://www.dlib.org/dlib/november02/johnson/11johnson.html>.

[9] See <http://www.jisc.ac.uk/index.cfm?name=programme_fair>.

[10] JISC Circular 1/02: Focus on Access to Institutional Resources Programme (FAIR). Available at: <http://www.jisc.ac.uk/index.cfm?name=circular_1_02>.

[11] JISC does not pay overheads and so the real value of these projects is probably about $6 million.

[12] See <http://cdlr.strath.ac.uk/projects/projects-hairst.html>, and <http://www.lib.gla.ac.uk/daedalus/>.

[13] See <http://tardis.eprints.org/>.

[14] See <http://www.lboro.ac.uk/departments/ls/disresearch/romeo/index.html>.

[15] See <http://www.sherpa.ac.uk/>, and <http://www.rdn.ac.uk/projects/eprints-uk/>.

[16] For definitions, see <http://www.oaforum.org/resources/glossary.php>.

[17] For further details, see John MacColl and Stephen Pinfield "Climbing the Scholarly Publishing Mountain with SHERPA". Ariadne, 33, September-October 2002. Available at <http://www.ariadne.ac.uk/issue33/sherpa/>.

[18] See Consultative Committee for Space Data Systems Reference model for an open archival information system (OAIS), 1999. Available at: <http://www.ccds.org/documents/p2/CCSDS-650.0-R-1.pdf>.

[19] See, <http://www.arl.org/newsltr/217/waters.html>, and <http://www.surf.nl/en/actueel/index2.php?oid=7>.

Copyright © Stephen Pinfield
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/march2003-pinfield