D-Lib Magazine
|
|
Stewart Granger |
IntroductionMany libraries and archives are in the process of ‘going digital’. The advantages of digital technology are well known and its adoption by libraries and archives seems inevitable, inexorable and well-motivated. Yet the fact remains that several key issues concerning the long term preservation of digital technologies remain unsolved. Two key problems are the fragility of digital media (its ‘shelf life’ compared with, say, non-acidic paper is extremely short) and, perhaps even more intractable, is the rate at which computer hardware and software become obsolete. Many cases have been cited in which valuable data has already been lost because of obsolescence. Moreover, as of today no one knows how to ensure the long-term preservation of multimedia documents nor how to ensure the integrity of documents that may have many links to other documents that may be anywhere in the world. For a brief overview of some digital preservation issues see [1] and [2]. These problems have, of course, been exercising the library and archive communities for some time but as yet no one solution or set of solutions has been reached. Solutions need to be found urgently if we are not to sink in what Rothenberg [4] calls ‘technological quicksand’. EmulationiGiven the problems just outlined, the dominant approach to digital preservation has been that of migration. Migration is the process of transferring data from a platform that is in danger of becoming obsolete to a current platform. This process has both dangers and costs. The notable danger is that of data loss, or in some cases the loss of original functionality or the ‘look and feel’ of the original platform. For these reasons, some have seen emulation as an alternative and superior strategy. The essential idea behind emulation is to be able to access or run original data/software on a new/current platform by running software on the new/current platform that emulates the original platform. The staunchest advocate of emulation has been Jeff Rothenberg whose views I will examine in some detail. What are the issues with respect to emulation as a digital preservation strategy? I suggest the picture appears as in Figure 1. I discuss briefly each of the major issues listed. Technical ApproachesIf emulation is to be adopted, the first question is what is to be emulated? Three options are:
I will provide some discussion of these options when discussing Rothenberg.
Figure 1: Issues for emulation ContentIt is here that emulation has its strongest attraction. One of the issues for digital preservation is what needs to be preserved in addition to the pure intellectual content of a digital object. In some cases the ‘look and feel’ of a digital object may be important, as may its interactivity. The fear is that migration to new platforms may lose these aspects, and that emulation may be the only strategy for preserving them. This problem is exacerbated, of course, by the increasing use of more complex digital objects (multimedia and hypermedia). Some analysis needs to be undertaken to provide a framework for examining the issue of when these aspects of digital objects are important, with a view to providing data for a cost-benefit analysis for the justification of the likely additional costs of preserving these complex aspects of digital objects. Legal/OrganizationalIntellectual Property Rights (IPR) issues may well be involved in emulating either operating systems or applications. It seems almost certain that the costs of emulation will mean that emulation will not (if unsupported) be an option for every user. This will be one part of a case for wanting trusted organisations that can undertake the work and make this available for others to use. Of course the nature and cost of these organisations are major issues but not issues that only effect emulation. Other IssuesAlso likely to be important for emulation as a preservation strategy are the issues of standards and open specifications. An ‘open environment’ and the adoption of standards are likely to make emulation more feasible and cost-effective. It is highly probable that metadata will play a key role in any emulation strategy -- defining precisely what this metadata should be and in what format is likely to be a major task. See [3], [5] and [7]. Rothenberg's ViewsiiAs a general comment, if Rothenberg’s scepticism with respect to approaches other than emulation is justified, and on the other hand, his optimism with respect to emulation is not justified (an open issue in my view), then the whole digital project is in serious trouble. Therefore I am concerned to examine Rothenberg’s attitudes as well as his main technical arguments. I do not attempt to provide a full account of Rothenberg’s approach but rather do attempt to raise some issues for discussion. First I examine his dismissal of other approaches. Rothenberg’s Views of Other ApproachesReliance on hard copy Reliance on standards Reliance on computer museums
Rothenberg does see two possible minor roles for computer museums in digital preservation: testing emulators and helping data recovery. Reliance on migration
This is a formidable list and in my view there is some justice in all of the points made, but I think two things need to be borne in mind:
The Ideal SolutionRothenberg’s idea of the ideal solution is certainly demanding:
Even supposing that this ‘ideal approach’ is feasible, doubts would remain, I think, about whether it would be required in all cases. Would the ‘ideal approach’ be the most cost effective solution for any and all requirements? How Should Emulation Be Achieved?The major parts of Rothenberg’s approach to emulation are:
This list implies a very ambitious plan of work and would seem to imply very large overheads. Rothenberg provides a diagram (see Figure 2) which shows how much needs to be encapsulated:
Figure 2. Encapsulation First, the information that has to be encapsulated comprises the document and its software environment. Central to the encapsulations is the digital document itself, consisting of one or more files representing the original bit stream of the document as it was stored and accessed by its original software. In addition, the encapsulation contains the original software for the document, itself stored as one or more files representing the original executable bit stream of the application program that created or displayed the document. A third set of files represents the bit streams of the operating system and any other software or data files comprising the software environment in which the document’s original application software ran. The second kind of information is an emulator specification which is supposed to: "…specify all attributes of the original hardware platform that are deemed relevant to recreating the behavior of the original document when its original software is run under emulation." [4] This will include: "….interaction modes, speed (of execution, display, access, and so forth), display attributes (pixel size and shape, color, dimensionality, and so forth), time and calendar representations, device and peripheral characteristics, distribution and networking features, multi-user aspects, version and configuration information, and other attributes." [4] The third kind of information to be encapsulated consists of explanatory material, labeling information, annotations, metadata about the document and its history, and documentation for the software and (emulated) hardware included in the encapsulation. Admittedly, some of this information can be represented by pointers to the information stored in a centralised repository, but it is nevertheless a daunting list, especially since no one currently knows how to produce some of the information (e.g., how does one produce an emulator specification?) Rothenberg himself acknowledges that there could well be IPR issues with respect to operating systems for example. What should be emulated? Discussion of RothenbergRothenberg’s approach is highly theoretical and one could almost say ‘absolutist’; it seems that nothing short of a complete, once and for all solution will satisfy him. His approach does not seem to concern itself with particularities of the immediate context. In some ways this is admirable, but to my mind it raises great doubts about the feasibility of his approach. Aside from questions about the technical feasibility of the approach, there are equally large doubts about its practical cost-effectiveness. The Task Force report [1] recognised the need for ‘trusted organisations’ on a number of grounds. It seems to me that were such organisations to be created, one of their roles could be to provide support for emulation -- under certain well defined conditions. I believe that this would be essential to make Rothenberg’s approach work since it seems highly probable that only if such institutions were created could the overheads implied by the approach be met. But, of course, the creation of such institutions will necessarily have to be a political one -- which argues against the kind of one-off complete theoretical solution that Rothenberg demands. Equally important, Rothenberg’s approach also will come up against political issues when it comes to IPR issues. The theoretical nature of Rothenberg’s approach does not seem to recognise that the current situation we are in could be described as a ‘digital jungle’. The extreme pace of technological change is exacerbating the challenge of digital preservation. Digital preservation problems were created when the world moved much more slowly -- from now on it is going to get much tougher. Bearman on RothenbergIn an opinion piece in D-Lib Magazine, April 1999 [8], David Bearman provided some fairly trenchant criticism of Rothenberg’s views. At the core of these criticisms was Bearman’s view that, "….Rothenberg is fundamentally trying to preserve the wrong thing by preserving information systems functionality rather than records. As a consequence, the emulation solution would not preserve electronic records as evidence even if it could be made to work and is serious overkill for most electronic documents where preserving evidence is not a requirement." It is important to understand that what Bearman means by a record is something much stronger than what many of us non-archivists understand (in our naivety probably thinking of a database record as a RECORD). In another of his papers [9] Bearman says, "…most information created and managed in information systems is not a record because it lacks the properties of evidence. Information captured in the process of communication will only be evidence if the content, structure and context information required to satisfy the functional requirements for record keeping is captured, maintained and usable." I have no reason or authority to question this claim, and it is indeed an important one, but I do think it is perhaps a little hard on Rothenberg to criticise him for not anticipating a particularly specialised set of requirements. Another criticism Bearman makes [8], is as follows, "It is worth noting that systems migration literature suggests that a significant contributor to high cost and low reliability migrations is the absence of unambiguous specifications of the source and target environments. The problem, quite often is that either (or both) source and target are protected by proprietary interests of the software producers. If this is a hurdle in systems migration, it is an absolute barrier to the viability of the emulation which will try to replicate a system environment years, decades, or perhaps centuries, after it became obsolete. Rothenberg acknowledges that "saving proprietary software, hardware specifications and documentation, as required by this emulation strategy, raises potential intellectual property issues". He correctly identifies this as "required" for the solution to function, yet refers to its as an "ancillary issue" in the title he gives the two paragraphs (section 8.3) in which it is discussed. Nowhere does Rothenberg suggest the dimensions of the Herculean social task that would be involved in creating a trusted agency for acquiring such proprietary information and administering it over time." These are important points; Rothenberg’s almost casual hand waving references to ‘emulator specifications’ surely glosses over a potential minefield of difficulties. Writing in 1997 about the difficulties of digital preservation, Bennett [11] makes some telling points pertinent to the view that the idea of a complete emulator specification may well be naïve,
Consider also the report of the emulation experiment reported by Ross and Gow [10], They experimented with emulating Sinclair Spectrum tapes on a PC. "To run an emulation you need to access the information on the tapes. We used a standard radio cassette recorder to "play " the tapes. One of the advantages of having a Sinclair expert is the pleasure of watching him listen to the tapes to discern the difference between the Spectrum loader or earlier Sinclair systems, such as the ZX80 or ZX81. The different loaders sound different when played through a normal cassette recorder. This sort of knowledge only comes with experience. It is this experience that can assist in data recovery. Faced with unmarked cassettes, it is possible to identify a Sinclair Spectrum or indeed any system by listening to it. When the various clicks and beeps were explained, it is easy to make the distinction. But the expertise comes from many years of working with the tapes." If the above example seems outlandish, consider the fact that many of us have experienced hours of frustration when installing new hardware or software on a PC when, in principle, there should have been no difficulty at all. Three Positions on EmulationThere would appear to be two extreme positions on emulation and one middle ground position. One extreme position is what Bearman aptly describes as the ‘magic bullet’ position: "…a simple, universally applicable, one-time fix." Approximately, at least, that is Rothenberg’s positionii and it is subject to the difficulties discussed above. The other extreme position is that emulation has no role to play in digital preservation. Bearman does not exactly adopt this view but comes quite close to it. Neither of the extreme positions seems to me to be tenable. What is the middle ground? A very modest view would be that emulation, sometimes at least, may play a role in rescue operations. (Indeed, it already has.) I think it would be hard to deny that emulation may play such a role, but is there a more general, strategic role it could play? Bennett says the following, "In an archive, it may be necessary to handle some emulations, but this can only be tenable in the short term, while both the emulated and the host emulator are current in technology terms. Obsolescence for the host environment will bring double jeopardy for the emulated environment. Archiving of an emulation and its dependants should be considered only for the near term, and in the advent of destructive forces." This is indeed in the middle ground as I have defined it. But I’m inclined to think that this position is too conservative. In particular I question whether it is essential that the emulated environment must be current. Consider the following scenario. An institution has a large archive of documents in Word 6 format. These documents are important, but it is rare for any given document to be accessed, so the institute wants to retain the ability to retrieve any particular document when the need arises. In this case I want to suggest that in principle there could be a strategic choice between migration and emulation. The institute could choose to migrate all the documents to a new format. Alternatively, it could choose to retain the documents in their original format (refreshing the media as required), providing the institute ensures that it retains the ability to read Word 6 documents. That ability, at the current time, would hardly constitute emulation at all, but surely current platforms could change in such a way that emulation would be required. (It could be that Windows emulation becomes required.) If that were the case, would not the archive be justified in making the strategic decision to opt for emulation even though the original platform had become obsolete? Emulation as a Strategic Digital Preservation Strategy - Some Imaginary ScenariosI want to consider the example just mooted in more detail with the hope of showing that, over a long period of time, different preservation strategies may need to be employed. In this view the best strategy at a given time could be contingent on particular factors. By definition I take a platform to be obsolescent when any part of that platform becomes obsolete. Consider the following situations: Time T0 Time T1 Time T2 Time T3 Time T4 I should add that, in agreement with many others, I would always favour preserving the original bit streams. Emulation and the CAMiLEON ProjectCAMiLEON (Creative Archiving at Michigan and Leeds) is a joint NSF/JISC funded project whose aim is to assess emulation as a digital preservation strategy. To achieve this aim the project will:
ConclusionThere is a case, I believe, for the view that emulation may have a strategic role to play in digital preservation. My view is more optimistic than most commentators but not as optimistic as Rothenberg's view. I do not believe that emulation can resolve all digital preservation issues. References[1] Preserving Digital Information: Report of the Task Force on Archiving of Digital Information, Commission on Preservation and Access and The Research Libraries Group, Inc., May 1, 1996, [2] Digital Collections: A Strategic policy Framework for creating and preserving digital resources, Arts and Humanities Data Service, N.Beagrie & D.Greenstein, 1998. Available at: [3] Metadata to Support Data Quality and Longevity, Jeff Rothenberg, IEEE, 1996, [4] Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation, Jeff Rothenberg, 1999. Available at: [5]" Who will create the Metadata for the Internet?" Charles F. & Linda S. Griffin, First Monday Vol. 3 No. 12, December 7, 1998. Available at: [6] The CAMiLEON Project, [7] "Metadata and Digital Preservation: a plea for cross-interest co-operation," Stewart Granger, VINE, 1999, Theme Issue on Metadata, Issue 117, Part 2, ISSSN 0305 5728, pp.24-29 London University. Also available at: [8] "Reality and Chimeras in the Preservation of Electronic Records", David Bearman, D-Lib Magazine, April 1999. Available at: [9] "Toward a Reference Model for Business Acceptable Communication", David Bearman, 1994, available at: [10] "Digital Archaeology: Rescuing Neglected and Damaged Data Resources", S. Ross and A. Gow, February 1999. Available at: [11] "A Framework of Data Types and Formats, and Issues Affecting the Long Term Preservation of Digital Materials", J.C. Bennett, 1997. Available at: [12] "The Problems of Software Conservation", Doran Swade, Computer Conservation Society. Available at: Notes[i] Issues about emulation as a digital preservation strategy are being studied in a joint JISC/NSF Project: CAMiLEON: Creative Arching at Michigan and Leeds: Emulating the Old on the New. See [6]. [ii] Perhaps the caveat that would have to be made is that Rothenberg does not claim that the solution is simple, but he does claim that it would be universal and one-off (if we can ever achieve it!). Copyright© 2000 Stewart Granger |
|
Top
| Contents |
|
D-Lib Magazine Access Terms and Conditions DOI: 10.1045/october2000-granger
|