Search D-Lib:
D-Lib-blocks5
The Magazine of Digital Library Research

D-Lib Magazine

May/June 2017
Volume 23, Number 5/6
Table of Contents

 

Scaling Up Perma.cc: Ensuring the Integrity of the Digital Scholarly Record

Kim Dulin and Adam Ziegler
Harvard Library Innovation Lab, Harvard Law Library
{kdulin, aziegler} [at] law.harvard.edu

 

https://doi.org/10.1045/may2017-dulin

 

Abstract

IMLS awarded the Harvard Library Innovation Lab a National Digital Platform grant to further develop the Lab's Perma.cc web archiving service. The funds will be used to provide technical enhancements to support an expanded user base, aid in outreach efforts to implement Perma.cc in the nation's academic libraries, and develop a commercial model for the service that will sustain the free service for the academic community. Perma.cc is a web archiving tool that puts the ability to archive a source in the hands of the author who is citing it. Once saved, Perma.cc assigns the source a new URL, which can be added to the original URL cited in the author's work, so that if the original link rots or is changed the Perma.cc URL will still lead to the original source. Perma.cc is being used widely in the legal community with great success; the IMLS grant will make the tool available to other areas of scholarship where link rot occurs and will provide a solution for those in the commercial arena who do not currently have one.

Keywords: Perma.cc, Harvard Library Innovation Lab, Digital Scholarly Record

 

1 The Link Rot Problem

The Harvard Library Innovation Lab received a grant from the Institute of Museum and Library Services (IMLS) to allow expansion of the Perma.cc web archiving service beyond its current user base and implement a commercial model for the service. Perma.cc allows authors to deposit copies of web pages they cite in the Perma.cc database and provides them with an additional URL copy of the page if the original URL changes or disappears. The IMLS grant award, distributed in June of 2016, provides $782,649 to the Library Innovation Lab over a two-year period.

Citation to persistent sources is fundamental to all academic scholarship. In the print era, libraries ensured that these sources were available for reference by preserving them in our collections. Citations today, however, increasingly refer to web pages, in addition to print sources. Because web pages frequently change their content or even disappear entirely, citations to them are ineffective at best and, at times, misleading. This problem, known as "link rot" or "reference rot," means that much of our citation-dependent scholarship is subject to change or has already disappeared (Kille, 2015).

A study we conducted in 2013 revealed that, in the legal field, 50% of United States Supreme Court cases and 70% of a selected ten year span of law review articles with web citations suffered from link rot (Zittrain, Albert and Lessig, 2014). A striking example comes from a United States Supreme Court concurring opinion, in which Justice Alito cited to a page that was later altered (Brown v. Entertainment Merchants Assoc., 2011). The altered webpage appears below.

dulin-fig1

More recently, a detailed study outside of the legal field confirmed the same; a roughly 70% level of link rot in a broad corpus of science, technology, and medicine articles published between 1997 and 2012 (Klein et al., 2014).

 

2 Why This Tool?

The Harvard Library Innovation Lab developed Perma.cc as one solution to the link rot problem. Perma.cc is a web archiving tool that allows authors of a work to make a copy of the web pages they cite. The page affiliated with the original URL is deposited in the Perma.cc database, where it is assigned a new URL that can be added to the original URL cited in the work. If the original URL disappears or changes, the reader will be able to follow the Perma.cc URL to find the web page affiliated with the original citation. In this way, Perma.cc puts the power to preserve the citation in the hands of the author.

An illustration of a citation with a Perma.cc reference follows, taken from President Barack Obama's recent article in the Harvard Law Review (Obama, 2017, p. 813):

See Neil Eggleston, President Obama Grants Another 98 Commutations in the Month of October, WHITE HOUSE: BLOG (Oct. 27, 2016, 4:00 PM), https://www.whitehouse.gov/blog/2016/10/27/president-obama-grants-another-98-commutations-month-october [https://perma.cc/2LBA-YTQX].

Libraries are integral to the Perma.cc process. Just as libraries have played a key role in the analog world as trusted repositories of the scholarly record, so can they in the digital world. Libraries serve as Perma.cc "registrars". They verify members of their respective communities (authors, researchers, editors, students) and provide them a method for signing up for the Perma.cc service. They answer simple questions about access to the service and educate their communities on the problem of link rot. Some library partners will also participate in hosting the Perma.cc archives to provide stability and redundancy.

Perma.cc aligns with the goals of the IMLS National Digital Platform by offering a scalable tool that allows users to act to preserve their citations at the time of writing. Its simple interface and user-friendly features make it easy for anyone to use. It distributes the power to archive widely, to writers of all types who are motivated by the desire to preserve the underpinnings of their work. It is a tool that truly gives power to the people.

Perma.cc also provides an opportunity for contributions from others who wish to improve or modify the tool. Its code is open source, all deposited on GitHub where users can and have submitted changes. It is interoperable, based on Memento, LOCKSS, Web ARChive file format, and other open standards and protocols and publishes its own powerful API. This aligns with the National Digital Platform theme to Enhance and Build Interoperable Tools and Services (National Digital Platform, 2015, p12).

 

3 IMLS Funded Development

The project funded by the IMLS provides support for expanding Perma.cc's use beyond the legal arena. Although Perma.cc was initially developed as a solution for those in the legal community, the link rot problem extends far beyond academic legal scholarship and court opinions. The problem impacts all of today's written work, scholarship from other academic disciplines, university press offices, commercial publishers, professional archivists, research centers and think tanks, government offices, law firms, news organizations and individual authors. Soon after Perma.cc was launched we began to receive requests for access to Perma.cc, from all the above groups, as well as inquiries from academic libraries outside the legal field about how they could join.

Perma.cc needed to develop its infrastructure to be able to serve these groups, and create a sustainable model to ensure that the service would remain free for the academic communities. We proposed developing a paid service in which private and commercial usage would subsidize the free-of-charge service made available to public and academic users.

We set a target of attracting 5-10% of the 3,700 existing academic libraries, including 25% of major research libraries. This would add 185-370 libraries as Perma.cc registrars, which is three to six times the number of libraries in our current network.

For the paid service, we proposed to conduct private paid beta tests with limited numbers of users from each group. The tests will evaluate assumptions about the market and gather feedback about the Perma.cc product, which will be used to develop the Perma.cc technology roadmap as necessary to meet these new use cases, and learn what marketing messages and tactics will be most effective in promoting the paid service. At the end of the test period, we plan to develop and refine detailed business, pricing and cost models to inform forecasts about sustainability. Near the end of the grant period, based on the insight gathered, we will publicly launch a paid service on terms that will put Perma.cc on sound, sustainable footing for the future.

 

4 Work Completed to Date

IMLS funding provided support for a full time developer devoted to making technical improvements to Perma.cc to enable it to scale sufficiently. Much of this work has been completed in the first year of the grant. The Perma.cc infrastructure has been moved to scalable cloud solutions. Perma.cc archives were designed to be backed-up to the Internet Archive; that integration is now complete. Improvements in the web capture area include preliminary support for audio/video captures, for multiple resolutions used by different devices, the ability to capture very complex, or partially broken, websites more quickly and reliably, more robust processing of websites' instructions regarding their preferred archiving policy (via meta tags, robots.txt, and x-robots-tag header), and improved stats/monitoring for analysis and remediation. Back end administrative tools have been made more robust, including the admin-side registrar approval process. In preparation for expanding Perma.cc's network of partners, we have experimented with backing up Perma.cc to two local LOCKSS boxes.

Much of the tech work has focused on making Perma.cc more user-friendly. We have developed new browser extensions to make linking more streamlined. Users can now upload their own pdf/image to supplement a capture. Users can more easily keep track of their links, have improved ability to label and organize them, and can transition seamlessly from one permission level to another. If the capture doesn't look right, users are now able to replace the capture image within the first 24 hours of uploading.

Improvements have also been made for the library registrar users. Registrars can now easily resend activation emails to new users and have the ability to receive support requests from their own users.

The majority of work on the development of the commercial model will take place in year two of the IMLS grant. Work this year has been focused on developing the financial infrastructure inside Perma.cc to allow for authentication and payment. Members of the Perma.cc team have been working actively with members of the law school's financial team to ensure compliance with accepted business practices.

Work on outreach to the larger academic library community has been ongoing since the receipt of the grant. An Advisory Board was created to work with members of the Perma.cc team on promotion and outreach. The use of Perma.cc has now been fully implemented in the Harvard Libraries and a number of Harvard librarians have been advising the Perma.cc team on best methods for implementation and outreach in other academic libraries. Perma.cc team members regularly attend academic conferences where they talk about the Perma.cc service and sign up new members. In the past year these conferences have included the International Federation of Library Associations and Institutions (IFLA), the International Internet Preservation Consortium (IIPC), and the Digital Social Memory Conference. Looking ahead, members of our team will be attending the annual meetings of the Coalition for Networked Information (CNI), the Association for College and Research Libraries (ACRL), and the American Library Association (ALA) among others.

 

5 Future Work

In the upcoming year we expect to complete the beta testing on our commercial version of Perma.cc. We plan to conduct customer and market development research in targeted areas. The tests will be designed to evaluate assumptions about market, costs and pricing, and to gather feedback about the Perma.cc product. By the end of the grant period we plan to launch the paid service.

Outreach to academic libraries and the scholarly community will continue and we hope to attract new library partners by the end of the grant period. We hope to sponsor a conference at Harvard focused on Link rot and the Perma.cc tool. We also expect to have the LOCKSS network in place so that we can begin to distribute responsibility for storing the archive among our library partners.

Perma.cc is an important weapon in the battle against link rot. In the few years it has been operational Perma.cc has saved nearly a half million links. With funding from the IMLS we can greatly increase that number. We look forward to working with our library and archival colleagues to bring awareness to the general public regarding link rot and to enlist their support in helping eradicate it.

 

References

[1] Kille, Leighton Walter. "The Growing Problem of Internet "Link Rot" and Best Practices for Media and Online Publishers." Journalist's Resource, last updated 9 Oct. 2015. Harvard Kennedy School. Archived at https://perma.cc/C4LE-VKSE.
[2] Brown v. Entertainment Merchants Assoc., 564 U.S. 786, 818 (2011) citing http://ssnat.com. Archived at https://perma.cc/PH9K-HV7K.
[3] Zittrain, Jonathan, Kendra Albert and Lawrence Lessig. "Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations." Harvard Law Review Forum, 127 (2014): 176 - 199. Archived at http://perma.cc/NE2Z-TMFK.
[4] Klein, Martin, Herbert Van de Sompel, Robert Sanderson, Harihar Shankar, Lyudmila Balakireva, Ke Zhou, and Richard Tobin. "Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot". PLoS ONE 9(12): e115253. (2014). https://doi.org/10.1371/journal.pone.0115253. Archived at http://perma.cc/P4HF-K6FS.
[5] Obama, Barack. "The President's Role in Advancing Criminal Justice Reform." Harvard Law Review, 130 (2017): 813. Archived at https://perma.cc/D42M-M2QD.
[6] United States. Institute for Museum and Library Services. IMLS Focus: The National Digital Platform for Libraries, Archives and Museums. Comp. Ricky Erway, Chrystie Hill, Sharon Streams and Steph Harmon. Washington: IMLS, 2015. Archived at http://perma.cc/QM83-TDSM.
 

About the Authors

Kim Dulin is the Director of the Harvard Library Innovation Lab and the Associate Director for Collection Development and Digital Initiatives at the Harvard Law Library. In addition to her experience as an academic law librarian, Kim has served as practicing attorney and an adjunct professor of law. Kim has a JD from the University of Iowa College of Law, an MS from the University of Illinois Graduate School of Library and Information Science, and a BA from the University of Iowa.

 

Adam Ziegler is the Managing Director of the Harvard Library Innovation Lab at the Harvard Law School Library. Adam joined Harvard from a legal technology startup he founded and led. Before that, Adam practiced as a litigation attorney in Washington, D.C. and Boston for 10 years and was a partner in one of Boston's top litigation boutiques. Following law school, Adam served as a judicial clerk to Judge James Ryan of the U.S. Court of Appeals for the Sixth Circuit in Detroit. Adam has a JD from the University of Michigan Law School and a BA from Davidson College.