Article Banner
Spacer

Preservation Risk Management for Web Resources

Virtual Remote Control in Cornell's Project Prism

Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli
Richard Entlich, Carl Lagoze, and Sandra Payette
Cornell University


spacer

Notes and References

  1. D Greenstein, S Thorin, D Mckinney, "Draft report of a meeting held on 10 April in Washington DC to discuss preliminary results of a survey issued by the DLF to its members, 23 April 2001," <http://www.diglib.org/roles/prelim.htm>. Back to the text of the article.

  2. Using the Wayback Machine, we noted that the four institutions studied (Cornell, Michigan, University of Pennsylvania, and Stanford) had sizeable increases in the number of licensed e-journals over the past four years. The most dramatic growth was at Penn (494 e-journals in February 1998, 5,258 in December 2001) and Stanford (684 in May 1999, 2,946 in December 2001). At the same time, the percentage of open access e-journals declined from 72% to 16% of the total at Penn and from 25% to 6% at Stanford. However, access to open databases included in the Penn gateway increased from 8% of the total external database links in February 1998 to 16% in December 2001. The number of open databases in the Stanford gateway more than doubled from May 1999 to December 2001, increasing from 31% of the total to 35%. Back to the text of the article.

  3. In contrast, only 6% of Michigan's electronic resources are open-access materials. Back to the text of the article.

  4. CPU Info Center, <http://bwrc.eecs.berkeley.edu/CIC/>. Back to the text of the article.

  5. OCLC Web Characterization web site, <http://wcp.oclc.org/>. Back to the text of the article.

  6. Philip Davis studied the effect of the Web on undergraduate citation behavior in 2000, and discovered that within six months 13% of the citations were found at a different URL and 16% of the citations could not be found at all. He also noted a growing tendency to rely on Web resources: from 1996 to 2000, Web citations increased from 9% to 22%. Philip Davis, "The Effect of the Web on Undergraduate Citation Behavior - a year 2000 update", in forthcoming College and Research Libraries (January 2002). Back to the text of the article.

  7. "Is your company's website vulnerable?" Computertimes, December 5, 2001, <http://computertimes.asiaone.com.sg/v20011205/issu02.shtml>. Symantec Company hosts a security response page that tracks the latest viruses and worms: <http://securityresponse.symantec.com>. Statistics on security incidences from 1988 on are reported by Carnegie Mellon's CERT® Coordination Center at <http://www.cert.org/stats/cert_stats.html>. Back to the text of the article.

  8. Elsevier Science "Information on Electronic Back Files, Access and Archiving," <http://www.elsevier.com/inca/publications/misc/ni2164.pdf>. Back to the text of the article.

  9. Dale Flecker, "Preserving Scholarly E-Journals," D-Lib Magazine, September 2001 Volume 7 Number 9, <http://www.dlib.org/dlib/september01/flecker/09flecker.html>. Back to the text of the article.

  10. W3C guidelines, <http://www.w3.org/TR/WCAG10/>. See the National Cancer Institute's "Web Design and Usability Guidelines" <http://cancernet.nci.nih.gov/usability/>, the Canadian Treasury Board's "Common Look and Feel for the Internet" <http://www.cio-dpi.gc.ca/clf-upe/>, (standards that all Canadian departments and agencies must fully implement by December 31, 2002), and US Department of Justice, A Guide to Disability Rights Laws, August 2001, www.usdoj.gov/crt/ada/cguide.htm <http://www.usdoj.gov/crt/ada/cguide.htm>. Back to the text of the article.

  11. Paperwork Reduction Act, PL 104-13, <http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=
    104_cong_public_laws&docid=f:publ13.104.pdf
    >. Back to the text of the article.

  12. Guidelines For Electronic Records Management On State And Federal Agency Websites, an NHPRC-funded research project conducted in 1997 by Charles R. McClure and J. Timothy Sprehe; McClure, Sprehe and Kristen Eschenfelder, Performance Measures for Federal Agency Websites, <http://www.defenselink.mil/webmasters/measures/>. The authors include a listing of relevant laws in Chapter 2, "Federal Policies Affecting Website Development and Management.". Back to the text of the article.

  13. Performance Measures for Federal Agency Web Sites Final Report, <http://www.defenselink.mil/webmasters/measures/>. Back to the text of the article.

  14. A Comprehensive Assessment of Public Information Dissemination, <http://www.nclis.gov/govt/assess/assess.vol1.pdf>. Back to the text of the article.

  15. See for instance the Smithsonian Institution consultant report "Archival Preservation of Smithsonian Web Resources: Strategies, Principles, and Best Practices" <http://www.si.edu/archives/archives/dollar%20report.html>, in which the author argued that Web resources represent important records documenting the Institution's corporate memory. The National Archives of Australia has also developed a policy and guidelines <http://www.naa.gov.au/recordkeeping/er/web_records/intro.html> for keeping records of Web-based activity in the Commonwealth Government in compliance with the National Office for the Information Economy "Guide to Minimum Website Standards" <http://www.govonline.gov.au/projects/strategy/Guide_to_minimum_website_Standards_v2.pdf>. See also Information Management Forum Internet and Intranet Working Group (Government of Canada), "An Approach to Managing Internet and Intranet Information for Long Term Access and Accountability," <http://www.imforumgi.gc.ca/iapproach_e.html>. Back to the text of the article.

  16. Description of NARA records management requirements, <http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=
    1994_uscode_suppl_3&docid=44usc3101
    >. Back to the text of the article.

  17. NARA Bulletin 98-02 and NARA Bulletin 99-05, [Disposition of Electronic Records] <http://www.nara.gov/records/policy/b9802.html>, <http://www.nara.gov/records/policy/b9905.html>; NARA Bulletin 2000-03 [Protecting Federal records and other documentary materials from unauthorized removal] <http://www.nara.gov/records/policy/b2000-03.html>. Back to the text of the article.

  18. National Archives and Records Administration <http://www.nara.gov/>. Back to the text of the article.

  19. Preserving Presidential Library Websites, San Diego Supercomputer Center <http://www.sdsc.edu/TR/TR-2001-03.pdf>. Back to the text of the article.

  20. Safeguarding Australia’s web resources: guidelines for creators and publishers, <http://www.nla.gov.au/guidelines/2000/webresources.html>. Back to the text of the article.

  21. Safekeeping,

    <http://www.nla.gov.au/padi/safekeeping/safekeeping.html>. Back to the text of the article.

  22. "Safekeeping Strategies", <http://www.nla.gov.au/padi/safekeeping/safekeeping.html#ss>. Back to the text of the article.

  23. E-mail, Susan Thomas, PADI Administrator, to Anne R. Kenney, December 17, 2001. A few of the items designated as safe-kept do not fully comply with the nine requirements, and notes on the arrangements for their safekeeping have been recorded in the PADI database. Back to the text of the article.

  24. The Internet Archive, <http://www.archive.org/>. Back to the text of the article.

  25. PANDORA Archive, <http://pandora.nla.gov.au/index.html>. Back to the text of the article.

  26. The Kulturarw3 Project - The Royal Swedish Web Archiw3e - An example of "complete" collection of web pages, <http://www.ifla.org/IV/ifla66/papers/154-157e.htm>. Back to the text of the article.

  27. The NEDLIB Harvester Project <http://www.csc.fi/sovellus/nedlib/>, supported by the National Library of Finland; the Nordic Web Archive <http://nwa.nb.no/> is a collaborative project of the Nordic national libraries, which began in September 2000; The Library of Congress Minerva Project <http://www.rlg.org/preserv/diginews/diginews5-2.html#feature> is a prototype system to collect and preserve materials from the Web, launched in 2000; and 'OCLCs Web Document Digital Archive Project <http://www.oclc.org/oclc/press/20010717b.shtm> is an effort announced in July 2001 to create a sustainable service to provide long-term access to Web documents, including those of the Government Printing Office. Back to the text of the article.

  28. See Internet Archive FAQs 11-13 at <http://web.archive.org/collections/web/faqs.html>; William Y. Arms, Roger Adkins, Cassy Ammen, and Allene Hayes, "Collecting and Preserving the Web: The Minerva Prototype," RLG DigiNews, April 15, 2001 <http://www.rlg.org/preserv/diginews/diginews5-2.html#feature1>; Juha Hakala, "Collecting and Preserving the Web: Developing and Testing the NEDLIB Harvester," RLG DigiNews, April 15, 2001, <http://www.rlg.org/preserv/diginews/diginews5-2.html#feature2>. Back to the text of the article.

  29. The Nedlib harvester schedules the download of inline images for Web pages as quickly as possible after the source page has been downloaded, to minimize the risk of losing important content. Back to the text of the article.

  30. One author (who admittedly works for a company whose products are designed to manage deep Web resources) estimates that subsurface public information is approximately 500 times the size of what's found at the surface level. See Michael Bergman, "The Deep Web: Surfacing Hidden Value," The Journal of Electronic Publishing, August, 2001 <http://www.press.umich.edu/jep/07-01/bergman.html>. Back to the text of the article.

  31. Attributes of a Trusted Digital Repository: Meeting the Needs of Research Resources <http://www.rlg.org/longterm/attributes01.pdf>. Back to the text of the article.

  32. William Blundon, "Security is in the eye of the beholder" <http://www.javaworld.com/javaworld/jw-09-1997/jw-09-blundon.html>. Back to the text of the article.

  33. Marge Kenney coined the term "worry radius" in the 1960s when her children first learned to drive. Back to the text of the article.

  34. There are interesting discussions around the evolution of insurance policies for digital assets, including Web-based materials. To date there is no definitive case law on whether computer data can be categorized as property that can be subject to physical loss or damage. Content owners are pushing to include language in policies that equates "disruption, corruption, deletion, theft, ..." to damage and loss, but insurers are excluding losses by virus. Success in defining loss and damage to digital materials as acceptable for tax purposes varies by state. Catherine L. Rivard and Michael A. Rossi, "Is Computer Data 'Tangible Property' or Subject to 'Physical Loss or Damage'?" -- Part 1 and Part 2. Insurance Law Group, Inc., August 2001 and November 2001, <http://www.irmi.com/expert/articles/rossi008.asp> and <http://www.irmi.com/expert/articles/rossi009.asp>. Back to the text of the article.

  35. "Executive Order: Critical Infrastructure Protection in the Information Age," was issued on October 16, 2001 at <http://www.ciao.gov/News/EOonCriticalInfrastrutureProtection101601.html>. One article entitled "ERM and September 11" discusses exposure management and extreme event risk planning as part of Enterprise Risk Management <http://www.irmi.com/expert/articles/miccolis005.asp>. The field is so recently competitive that the Risk Management Internet Services (RMIS) <http://www.rmis.com> proclaims 'on the Net since 1996'. Back to the text of the article.

  36. The International Risk Management Benchmarking Association (IRMBA) <http://www.irmba.com> and the Risk Management Reports <http://www.riskreports.com> are just two examples of risk management information providers. Back to the text of the article.

  37. For example, see: the Global Risk Management Network (GRMN) <http://www.grmn.com/pages/default.asp>; the International Risk Management Institute (IRMI) <http://www.irmi.com>; the Risk Management Foundation: Harvard Medical Institutions <http://www.rmf.harvard.edu/>; the American Society for Healthcare Risk Management (ASHRM) of the American Hospital Association <http://www.ashrm.org/asp/home/home.asp>; the Public Risk Management Association (PRIMA): Nonprofit Risk Management Center and Public Entity Risk Institute <http://www.primacentral.org/index.html>; the Nonprofit Risk Organization <http://www.nonprofitrisk.org>; the National Risk Management Research Laboratory (NRMRL) <http://www.epa.gov/ORD/NRMRL>; the Global Association of Risk Professionals (GARP) <http://www.garp.com/index-b.htm>; and The Risk Management Association (RMA) <http://www.rmahq.org>. Back to the text of the article.

  38. The Wharton Risk Management and Decision Process Center, <http://grace.wharton.upenn.edu/risk/>. Back to the text of the article.

  39. Risk Management of Digital Information: A File Format Investigation, <http://www.clir.org/pubs/abstract/pub93abst.html>. Back to the text of the article.

  40. In "Information Risk Management: Why Now?" Christian Byrnes identifies a series of steps involved in establishing a security program. The steps are as follows: develop a security policy, design security domains to address the needs of categories of assets, assess the 'security posture' of the organization, evaluate risks, and select and implement a security strategy. See <http://www.trusecure.com/html/tspub/whitepapers/irm.pdf>. Back to the text of the article.

  41. The stages as presented are based most closely on Paul R. Kleindorfer's "Industrial Ecology and Risk Analysis" in L. Ayres and R. Ayres (ed.s), Handbook of Industrial Ecology, forthcoming Elsevier, 2001 <http://grace.wharton.upenn.edu/risk/downloads/01-23-PK.pdf>, but other resources provide good discussions of all or some of the stages, such as, Jean C. Miller, "Risk Management for Your Web Site", in Anita Schoenfeld, ed. International Risk Management Institute, September 2000, <http://www.irmi.com/expert/articles/schoenfeld003.asp>, Howard Kunreuther, et al., "Using Cost-Benefit Analysis to Evaluate Mitigation-Measures for Lifelines", April 2001, <http://grace.wharton.upenn.edu/risk/downloads/01-14-HK.pdf>; David McNamee, "Assessing Risk Assessment," <http://www.mc2consulting.com/riskart2.htm>; Angus Wood, "Integrating Risk Assessment into the Enterprise Information Management Strategy" presented at the 6th International Pipeline Reliability Conference, November 19-22, 1996, Houston, Texas, <http://www.itpapers.com/cgi/PSummaryIT.pl?paperid=8433&scid=88>; and Howard Kunreuther and Patricia Grossi, "The Role of Uncertainty on Alternative Disaster Management Strategies", April 2001, <http://grace.wharton.upenn.edu/risk/downloads/01-15-HK.pdf>. Back to the text of the article.

  42. Kleindorfer, for example, identifies tools that could be applied at each stage. Back to the text of the article.

  43. Home page of the Mercator Web Crawler, <http://www.research.compaq.com/SRC/mercator/>. Back to the text of the article.

  44. Risk management authors tend to put "classification" either as the last step in risk identification or the first step in risk assessment. Project Prism combines Steps 1 and 2. The lines between the steps of the process are not always firm but the categories can be useful for discussion and analysis. Back to the text of the article.

  45. The full classification is at <http://www1.oecd.org/EHS/CARAT/v3.0/htm/default.htm>. H. Felix Kloman has an interesting diagram that distinguishes global and organizational risks in "The Risk Spectrum," Risk Management Reports, 2001 <http://www.riskreports.com/spectrum.html>. Back to the text of the article.

  46. A program for the ongoing identification of new or evolving risks, an analysis of a more comprehensive set of Web resource risks, and the definition of appropriate responses to identified risk scenarios will be part of follow-on research work and coordinated implementation projects. Back to the text of the article.

  47. The Wharton Risk Management and Decision Processes Center refers to the descriptive (what can happen) and prescriptive (how to avoid) approach <http://grace.wharton.upenn.edu/risk/>. Back to the text of the article.

  48. Ira Rosenthal, Al Ignatowski, C. Kirchsteiger, "A Generic Standard for the Risk Assessment Process: Discussion on a proposal made by the program committee of JC-J, RC Workshop on "Promotion of Technical Harmonization of Risk-Based Decision Making". Back to the text of the article.

  49. Protecting Your Nonprofit and the Board, from the Nonprofit Risk organization, suggests considering risk financing to insure that recovery costs are covered <http://www.nonprofitrisk.org/nwsltr/archive/nl199_1.htm>. Back to the text of the article.

  50. The National Risk Management Research Laboratories <http://www.epa.gov/ordntrnt/ORD/NRMRL/> identify prevention and control, remediation, and restoration as the potential responses to risk. Back to the text of the article.

  51. Howard Kunreuther, Patricia Grossi, Nano Seeber and Andrew Smyth, "A Framework for Evaluating the Cost-Effectiveness of Mitigation Measures", paper presented at the Bogazici University /Columbia University Workshop <http://grace.wharton.upenn.edu/risk/downloads/01-18-HK.pdf>. Back to the text of the article.

  52. Jerry Miccolis provides lessons learned and stresses the need to consider interdependencies of infrastructures and strategies in "ERM and September 11" <http://www.irmi.com/expert/articles/miccolis005.asp>. Back to the text of the article.

  53. "Making Net Gains: Staying Safe While Making a Name for Your Nonprofit on the Internet" from the Nonprofit Risk Organization presents useful steps and examples <http://www.nonprofitrisk.org/nwsltr/current/nl901_3.htm>. The Global Association of Risk Professionals (GARP), op. cit., suggests techniques to support risk management: risk modeling, risk quantification, risk metrics, value-at-risk assessment, a risk measurement framework, an eRisk program, and a risk management portfolio. Angus Wood in "Integrating Risk Assessment into the Enterprise Information Management Strategy," op. cit., discusses the different kinds of management that are needed: risk-based, event, stakeholder, litigation, and management. Paul R. Kleindorfer notes in "Industrial Ecology and Risk Analysis," op. cit., that the informational regulation (IR) trend urges an open policy on risk management programs and the need for shared systematic knowledge. David McNamee in "Assessing Risk Assessment," op. cit., discusses risk-based auditing. Back to the text of the article.

  54. Full Speed Ahead: Managing Technology Risk in the Nonprofit World, from the Nonprofit Risk Organization, suggests some policy requirements, including a Web site disclaimer and notice of proprietary information, Web links policy and disclaimer, Web security statement, and a Website privacy statement <http://www.nonprofitrisk.org/pubs/full_spd.htm>. Jean C. Miller in "Risk Management for Your Web Site," op. cit., identifies quantitative (monetary values) and qualitative (scenario-oriented, potential threats, evaluate countermeasures and safeguards) measures. Costs should include acquisition, development, and maintenance of the information and the value to owners, custodians, users, as well as the current value in the world. Miller also recommends keeping an inventory of countermeasures and defines the characteristics of countermeasures as cost effective, requiring minimal human intervention, complete and consistent, sustainable, auditable, accountable (initiated by an authoritative entity), and recoverable (introduces no additional damage). The Nonprofit Risk Organization recommends a risk management team and crisis response team. RMIS promotes an anti-virus response team, and Risk Management Reports suggests having a chief risk officer. The Risk Management Association offers a range of tools that support project management, data collection, the development of manuals, and training. RMIS provides risk management alerts and advisories; risk assessments; anti-virus detection and response tools and teams; and Internet growth, performance, and weather reports. Back to the text of the article.

  55. World Wide Web Consortium, XHTML 1.0: The Extensible HyperText Markup Language (Second Edition), 2001 <http://www.w3.org/TR/2001/WD-xhtml1-20011004/>. Back to the text of the article.

  56. D. Raggett, Clean up your Web pages with HTML TIDY, 2000 <http://www.w3.org/People/Raggett/tidy/>. Back to the text of the article.

  57. Joint Photographic Experts Group, JPEG home page, <http://www.jpeg.org/public/jpeghomepage.htm>. Back to the text of the article.

  58. The Portable Network Graphics (PNG) home page, <http://www.libpng.org/pub/png/>. Back to the text of the article.

  59. S. Lawrence, K. Bollacker, and C. L. Giles, "Digital Libraries and Autonomous Citation Indexing," IEEE Computer, 32 (6), pp. 67-71, 1999. Back to the text of the article.

  60. Dublin Core Metadata Initiative, <http://dublincore.org>. Back to the text of the article.

  61. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1," The Internet Society, RFC 2616, June 1999. <http://www.ietf.org/rfc/rfc2616.txt>. Back to the text of the article.

  62. N. Shivakumar and H. Garcia-Molina, "Finding near-replicas of documents on the web," presented at WebDB'98, 1998; A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig, "Syntactic Clustering on the Web," DEC Systems Research Center, Palo Alto, Technical Note 1997-015, 1997. <http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/SRC-1997-015-html>. Back to the text of the article.

  63. T. A. Phelps and R. Wilensky, "Robust Hyperlinks: Cheap, Everywhere, Now," presented at Digital Documents and Electronic Publishing (DDEP00), Munich, 2000. Back to the text of the article.

  64. See S. Brin and L. Page, "Anatomy of a large-scale hypertextual Web search engine," Computer Networks and ISDN Systems, 30 (1-7), pp. 107-117, 1998; and J. M. Kleinberg, "Authoritative sources in a hyperlinked environment," Journal of the ACM, 46 (5), pp. 604-632, 1999. Back to the text of the article.

  65. J. M. Kleinberg, "Authoritative sources in a hyperlinked environment," Journal of the ACM, 46 (5), pp. 604-632, 1999. Back to the text of the article.

  66. Statistical data should be correlated with perceptions as well. For instance, Joann D'Esposito and Rachel Gardner reported that students rate Internet sites from government, educational institutions, and reputable businesses and corporations as the most reliable and valuable for their research papers, in "University Students' Perceptions of the Internet: An Exploratory Study," Journal of Academic Librarianship 25, no. 6 (1999), 456-461. Philip Davis's study, op.cit., showed that the most heavily cited Web resources by students were .coms (44%), followed by .org (27%), then .gov (20%), with .edu trailing at less than 7%. Back to the text of the article.

  67. D. Brickley, RDF sitemaps and Dublin Core site summaries, 1999 <http://rudolf.opensource.ac.uk/about/specs/sitemap.html>. Back to the text of the article.

  68. McClure and Sprehe, 1997 report. Back to the text of the article.

  69. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Data structures and algorithms, Repr. with corrections. ed. Reading, Mass.: Addison-Wesley, 1987; S. R. Kumar, P. Raghaan, S. Rajagopalan, D. Sivakumar, A. S. Tomkins, and E. Upfal, "The web as a graph," presented at Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Dallas, 2000. Back to the text of the article.

  70. K. Zhang, J. T. L. Wang, and D. Shasha, "On the Editing Distance between Undirected Acyclic Graphs and Related Problems," presented at CPM Combinatorial Pattern Matching, 1995.; H. Bunke, "A graph distance metric based on the maximal common subgraph," Pattern Recognition Letters, 19, pp. 255-259, 1998. Back to the text of the article.

  71. Copyright© 2002 Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette
spacer
spacer

Back to the Article