Previous | Contents | Next | ||
Issues in Science and Technology
Librarianship |
Winter 2001 |
The e-print arXiv at the Los Alamos National Laboratory acts as a repository for electronic versions of papers in physics and mathematics, providing a rapid and convenient way for scientists to rapidly share their results with colleagues. Recently the arXiv was transferred to the Research Library, as part of its Library Without Walls. This article traces the development of the arXiv and examines some of the implications for libraries. Opportunities and challenges exist to integrate new forms of scholarly communication with newly developed digital library services offered by leading-edge libraries.
In June of 2000, administrative responsibility for daily operations of the Los Alamos arXiv was transferred from the Theoretical Physics Division to the Research Library. Added as a complement to the Library Without Walls, the move was intended to create and support synergies between new e-print advances and rapidly developing state-of-the-art digital library services. The move was significant on two levels. First, it represented a formalized commitment by the Laboratory regarding funding support of the arXiv. Secondly, it provided the Research Library with a unique opportunity to bridge the informal author self-archiving world with the formal scholarly communication systems in the digital library. Since the Research Library partners with research institutions around the world to enhance the advancement of information technology to support collaboration among researchers, developing a symbiotic relationship with the arXiv was a logical, and perhaps overdue, development.
An "e-print" denotes self-archiving by the author. The American Physical Society notes that an e-print includes any electronic work circulated by the author outside of the traditional publishing environment (APS News 1998). E-prints may be unpublished works, preprints, or published works. Unlike the familiar paper preprints, the e-prints can be, and often are, updated by the author at any time, including after the peer-review process. In some subjects, where rapid transmission of knowledge is critical, electronic dissemination of preprints is an absolute necessity, with subsequent traditional publication becoming almost a formality (Langer 2000).
The archive is often contrasted with journals, but it was never intended to be a journal, and it was not intended to perform peer review. It is a repository for electronic versions of papers, providing a convenient way for scientists to share their results with colleagues. Since newly posted papers have not been peer reviewed, readers must use their own judgment regarding the trustworthiness of anything downloaded (The Economist 2000).
Today the arXiv application code is comprised of roughly 30,000 lines of Perl, running under Linux with numerous other programs (TEX, ghostscript, tar, gnuzip). A small team of roughly four FTE supports the current activities of maintaining and rewriting the Perl code in modular fashion, supporting the mirror servers, and correcting author submissions. The application has been optimized to permit virtually any researcher with low-end network connections to access and download papers of interest. Very consciously, this provides a level playing field for researchers at different academic levels and different geographic locations.
An analysis of the domain composition as a percentage of the total of all submissions (Aug '91 through Dec '00) follows:
High energy | Cond. matter | Astro-phys | math | Gen. relativity & quantum cosmology | Nuclear | Quant - phys | Non-Linear | Phys-Other | CS |
37% | 18% | 17% | 9% | 5% | 4% | 3.5% | 2.5% | 2% | 1% |
Submission distribution according to e-mail domain of submitting author for 101,792 submissions received during the four year period 1 Jan 97 through 31 Dec 00 can be found at http://arXiv.org/Stats/au_all.html.
The table below shows the mode of submissions of papers via the web, electronic mail and FTP, from calendar '96 through calendar year '00:
Uploads | Late '96 | '97 | '98 | '99 | '00 |
Web | 13% | 21% | 49% | 60% | 68% |
77% | 67% | 43% | 34% | 27% | |
FTP | 10% | 12% | 8% | 6% | 5% |
Authors can submit, replace and withdraw papers, add journal references, etc. (See: http://arxiv.org/help/). Manifestly bad papers in the archive are simply ignored. Those that are interesting and controversial attract instant attention. For better or worse, there is no time lag.
On Friday May 21, 1999 we received an article in the General Relativity and Quantum Cosmology section of arXiv titled "A 'warp drive' with more reasonable total energy requirements" from a Belgian physicist. This is, despite its seemingly pure science fiction title (and well timed with the release of "the" new Star Trek episode at that time), a serious paper. Thus the paper was released and announced in the next mailing the following Sunday night/Monday morning. (See http://arXiv.org/abs/gr-qc/9905084.) Four days later on Thursday May 27, 02:14PM EDT slashdot posted a news story about this at http://slashdot.org/articles/99/05/27/1215204.shtml and almost instantly accesses to our server skyrocketed from noon to about 4pm that day, with a peak of over 16,000 hits in an hour. (See http://arXiv.org/Stats/9905/990527.html). Interestingly a great many readers only visited the abstract and didn't continue to download the full text, and a sizable fraction of downloads originated from physics departments all over the world. Since news on Slashdot is cycled away and moved to their archives within the same day, activity on the following day was back to almost normal levels. Our referrer logs show a few "late-comers," people catching up with yesterday's news, but there was no significant impact on the server. Perhaps just being exposed to the abstract is valuable for many casual readers. Nonetheless, the slashdot effect illustrates some of the problems inherent with suggestions that simply counting downloaded papers in this new environment can be used to gauge relative importance or impact.
One of the myths surrounding the arXiv is that the system is unstable as an archive because the e-print literature can come and go on an author's whim. Once a paper has been posted on the arXiv, it is given a date stamp and processed. Making changes results in a new version and the archive provides public access to previous versions of submitted papers. Therefore, even though the current version of a paper may be marked as withdrawn, previous versions can still be retrieved.
One of the problems that the e-print movement has had to confront is who should be responsible for the e-prints and how. Many supporters of e-prints want to see the traditional publishers removed from the role of caretakers of this scholarly communication genre, substituting either the professional societies or universities. However, this raises the major question of archiving. Historically, responsibility for archiving the scholarly literature has been the domain of libraries. Plagued with issues of cost, related to both content acquisition and the associated physical storage of print volumes, archiving print volumes has created a crisis for most research libraries.
Now with electronic archives, we replace those issues with the challenge of constantly migrating data formats and the associated enabling IT infrastructure for storage and access systems. Again, there are clearly cost issues surrounding the issue of constantly migrating technology and content formats. The archiving challenge becomes far more complex, however, when consideration is given to the question of preserving the robust environment that today's technology now supports. Increasingly, merely preserving the article itself cannot capture the value of an electronic article. Rather the value is in the associated contextual links, associated graphics, multi-media and connecting databases that have become intrinsic parts of modern scientific literature. Given this fact, in the very near term, the print versions of journals will not be the true archives.
One of the strategies of archiving is to physically distribute assets around to minimize the risk of a geographic disaster causing the irreplaceable loss of unique archive contents. In the e-print world, mirroring the data and access systems in several locations can mitigate this risk because a mirror image of both the data and the retrieval system is located in separate geographic sites. The Los Alamos arXiv is mirrored around the world in: Australia; Brazil; China; France; Germany, Israel; India; Italy; Japan; Russia; South Africa; South Korea; Spain; Taiwan; U.K.; U.S. (new APS mirror). Ironically, the Los Alamos arXiv is significantly better off considering this archiving dimension, as opposed to the current condition for a large fraction of formal publishers with electronic content located at one site and housed on one system.
The e-print archives are entirely scientist driven, and flexible enough either to co-exist with the pre-existing publication system, or to help it evolve into something better able to meet researcher needs (Ginsparg 2000).
These are simply beginning, early steps in the transformation of use of the scientific literature. User adoption of these integrated capabilities has been high and corresponding user feedback has been very positive. The tighter integration of formal and newly rising informal e-print systems represents an enormous opportunity for libraries and information providers -- all to the benefit of the researcher.
The arXiv is based upon activities supported by the U.S. National Science Foundation under Agreement No. 9413208 (1 Mar 1995 thru 30 Sep 2000) with Los Alamos National Laboratory, and by the U.S. Department of Energy. Mirror sites are funded locally. Opinions, findings, and conclusions expressed in this article are those of the author and do not necessarily reflect the views of the U.S. National Science Foundation, the U.S. Department of Energy, or Los Alamos National Laboratory.
Boyce, Peter B. 2000. For better or worse: preprint servers are here to stay. College and Research Libraries News 61(5): 404-407, 414.
First online graduate physics textbook hits the web. 2000. APS News Online (March 2000). [Online] Available: http://positron.aps.org/apsnews/0300/030011.html [February 21, 2001].
Ginsparg, Paul. 2000. Creating a Global Knowledge Network: Don't Just Clone the Paper Methodology In: Freedom of Information Conference: (6-7 July 2000: New York Academy of Medicine). [Online] Available: http://www.biomedcentral.com/info/ginsparg-ed.asp [February 21, 2001].
Holtkamp, Irma S. and Berg, Donna A. 2001. The Impact of Paul Ginsparg's ePrint arXiv (Formerly Known as xxx.lanl.gov) at Los Alamos National Laboratory on Scholarly Communications and Publishing: A Selected Bibliography. [Online] Available: http://lib-www.lanl.gov/libinfo/preprintsbib.htm [February 21, 2001].
Langer, James. 2000. Physicists in the new era of electronic publishing. Physics Today Online. [Online] Available: http://www.aip.org/pt/vol-53/iss-8/p35.html [February 21, 2001].
Pack, Thomas and Pemberton, Jeff. 1999. A harbinger of change: the cutting edge library at the Los Alamos National Laboratory. Online Magazine. 23(2): 34-42. [Online]. Available: http://www.onlineinc.com/articles/onlinemag/pack993.html [February 21, 2001].
Van de Sompel, Herbert and Hochstenbach, Patrick. 1999. Reference linking in a hybrid library environment part 3: generalizing the SFX solution in the "SFX@Ghent & SFX@LANL" experiment. D-Lib Magazine. 5(10). [Online] Available: http://www.dlib.org/dlib/october99/van_de_sompel/10van_de_sompel.html [February 21, 2001].
Will Journal Publishers Perish? 2000. The Economist 355(8170): 81-82.
We welcome your comments about this article. Please fill out this form for possible inclusion in a future issue. |
Would you like to be notified about new issues of ISTL? Join our mailing list. |
Previous | Contents | Next |