Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Articles

spacer

D-Lib Magazine
July/August 2003

Volume 9 Number 7/8

ISSN 1082-9873

What Do We Mean by Authentic?

What's the Real McCoy?

 

H.M. Gladney
HMG Consulting
20044 Glen Brae Drive
Saratoga, CA 95070 <hgladney@pacbell.net>

J.L. Bennett
Independent Consultant
3643 Justine Drive
San Jose, CA 95124

Red Line

spacer

Abstract

Authenticity is among digital document security properties needing attention. Literature focused on preservation reveals uncertainty — even confusion — about what we might mean by authentic. The current article provides a definition that spans vernacular usage of "authentic", ranging from digital documents through material artifacts to natural objects.

We accomplish this by modeling entity transmission through time and space by signal sequences and object representations at way stations, and by carefully distinguishing objective facts from subjective values and opinions. Our model can be used to clarify other words that denote information quality, such as "evidence", "essential", and "useful".

Everyday language is a part of the human organism and is no less complicated than it ....The tacit conventions on which the understanding of everyday language depends are enormously complicated. Ludwig Wittgenstein, Tractatus Logico-Philosophicus 4.002

1.0 Introduction

Digital documents are becoming important in most kinds of human activity. Whenever we buy something valuable, agree to a contract, design and build a machine, or provide a service, we should understand exactly what we intend and be ready to describe this as accurately as the occasion demands. This makes worthwhile whatever care is needed to devise definitions that are sufficiently precise and distinct from each other to explain what we are doing and to minimize community confusion.

When we set out, some months ago, to describe answers to the open technical challenges of digital preservation, we took for granted the existence of a broad, unambiguous definition for authentic. That this assumption was naively optimistic is suggested by accounts of two recent conferences, [CLIR 92] and [Steemson] [2], and other publications.

Document authenticity is of fundamental importance not only for scholarly work, but also for practical affairs, including legal matters, regulatory requirements, military and other governmental information [3], and financial transactions. Trust, and evidence for deciding what can be trusted as authentic are considered in many works about digital preservation, such as [Bearman], [Cullen], [Lynch], [Lynch 01], [MacNeil], [Ross], and [Thibodeau]. These topics are broad, deep, and subtle, raising many questions.

Among these, the current work addresses a single question, "What is a useful meaning of authentic or of authenticity for digital documents — a meaning that is not itself a source of confusion?" Progress in managing digital information would be hampered without a clear answer that is sufficiently objective to guide the evaluation of communication and computing technology [4].

Our approach to constructing an answer to this question is to break each object transmission into pieces whose treatment we can describe explicitly and with attention to potential imperfections.

2.0 What We Mean by Authentic

The current article is stimulated by our needing precise and objective language to describe a method for digital preservation [Gladney 1]— a method that we illustrate with the assistance of Figure 1. Specifically, we want an operational definition for authentic for every situation we encounter in object communication. We manage by limiting ourselves to objective facts, decomposing possible transmissions into steps of whose input and output entities we can speak, and making comparisons among these entities.

Figure 1 helps us express distinctions about the nature and purpose of a message and each of its potential transmission channels. The transmission 0 to 1 involves semantics (expression of meaning with symbols), and transmission 9 to 10 includes interpretation (trying to understand what received symbols mean) [5]. All the other steps are (or can be made to be) purely mechanical and syntactic (what the signals (signs) look like and how they are transformed in each transmission step). We can say nothing objectively certain about the relationship of a conceptual input 0 or conceptual output 10 to any intermediate form in the transmission channel. We therefore restrict our statements about authenticity to relationships among the intermediate forms 1 through 9.

Figure 1 suggests that authenticity can be articulated in terms of the relationship of each potential output instance to some specific input step in its transmission history. The digital document may have been generated from text, analog sound, an image, or a video. For both signal and artifact transmissions, we can describe (at least in principle) the transformations that occurred in each part of the transmission channel that was used for the case at hand. Authenticity is typically discussed by comparing a 9 instance with a 1 instance. Instances denoted by 2 through 8 might be useful to discuss the origins and specific forms of signal degradation. If we do not know the transmission channel, we can still speak objectively in terms of probabilities instead of certainties. Figure 2 suggests additional communications aspects.

 

Digital object delivery pathways

Figure 1: Digital object delivery pathways [6], including potentially long delays in archival repositories; the numbers denote exemplary input and output instances — ephemeral objects that the producer intends should represent the conceptual object 0.

 


 

Different ways by which we might delivery information

Figure 2: Different ways by which we might deliver information, suggesting copying and transformations that might occur. The solid arrows depict real transmissions; the dotted arrow 4 to 5 depicts a virtual transmission that reflects how human beings perceive and discuss their communication. The numbers that duplicate Figure 1 numbers have the same meanings as in Figure 1.

spacer

 

Without computed transformations, human users cannot easily interpret a digital representation such as the ' ... 011100010100 ... ' bitstring in Figure 2. The 3-4-5-3' rectangle reminds us of machine support that is so ubiquitous that we rarely discuss it. For instance, suppose that an information consumer talked to its producer about a photograph of an alabaster bust of King Tutankhamen that the producer had previously sent. A careful explanation of what actually happened is that these people caused a binary image, 3, in the producer's PC to be copied into an identical binary image, 3', in the consumer's PC. Later they conversed about the screen images, 4 and 5; 4 is a derivative object of 3, and 5 is a derivative of 3'. If the two digital to analog conversions depicted as the Figure 2 rectangle sides are not identical, the consumer will be shown something different from what the producer was shown. Compare the passage:

"The ideal preservation system would be a neutral communications channel for transmitting information to the future. This channel should not corrupt or change the messages transmitted in any way ....In abstract terms, we would like to be able to assert that, if Xt0 was an object [7] put into the box at time, t0, and Xtn is the same object retrieved from the box at a later time, tn, then Xtn =Xt0.
"However, the analysis of the previous sections shows that this cannot be the case for digital objects ....So, the black box for digital preservation ... includes a process for ingesting objects into storage and a process for retrieving ... and delivering them to customers. These processes, for digital objects, inevitably involve transformations; therefore, the equation, then Xtn =Xt0 cannot be true for digital objects.
" ... Nonetheless, given that a digital information object is not something ... preserved as an inscription on a physical medium, but something that can only be ... reconstructed by using software to process stored inscriptions, it is necessary to have an explicit model or standard that is independent of the stored object and that provides a criterion ... for assessing the authenticity of the reconstructed object." [Thibodeau]

The language in such descriptions (others occur in the literature) suggests that a single information object occurs at each space/time location, rather than two or more different objects that we might need to describe what is happening, such as 3 and 4 in the producer's personal computer. In Thibodeau's notation, X5 =X4 would be very difficult to achieve and even if achieved would be almost impossible to demonstrate. In contrast, X3' =X3 is easily achieved and demonstrated. Thus, Figure 2 suggests how careful we must be in identifying derivative objects and explicating their interrelationships.

2.1 Everyday Language and Careful Definitions

"The concepts of reliability and authenticity as expressed in archival discourse are posited on a direct connection between the world and are rooted ... in observational principles. A reliable record is one that is capable of standing for the facts ... to which it attests ....An authentic record is one that is what it claims to be and that has not been corrupted or otherwise falsified since its creation." [MacNeil]

Precise language for object properties is critical when we negotiate agreements with service and technology users. We are free to choose or devise our language symbols — its words and pictures — any way we find helpful. However, whenever a choice coincides with a widely used locution, but differs from how people use it in informal discourse [8], we risk misunderstandings and tedious explanations of how and why what we intend differs from what other people expect [9]. The choices in this article never clash with dictionary definitions. However, a software engineer charged with providing authenticity management in a digital repository system is afforded little help by definitions such as:

authentic

Oxford English Dictionary

trustworthy; of undisputed origin, genuine.

authentic

Random House Dictionary

entitled to acceptance or belief because of agreement with known facts or experience; reliable; trustworthy: an authentic portrayal of the past. 2. not false or copied; genuine; real: an authentic antique. 3. having the origin supported by unquestionable evidence; authenticated; verified: an authentic document of the Middle Ages; an authentic work of the old master. 4. Law. executed with all due formalities: an authentic deed.

authenticity

Random House Dictionary

quality of being authentic; genuineness.

fact

Black's Law Dictionary

a thing done; an action performed or an incident happening; an event or circumstance; an actual occurrence; an actual happening in a time or space or an event mental or physical; that which has taken place. A fact is either a state of things, that is, an existence, or a motion, that is, an event.

integrity

Concise OED

wholeness; soundness; uprightness, honesty

integrity

Random House Dictionary

2. the state of being whole, entire, or undiminished; to preserve the integrity of the empire; 3. a sound, unimpaired, or perfect condition; the integrity of the text, ... of a ship's hull..

provenance

Concise OED

(place of) origin, as vases of doubtful provenance.

Imprecisely, any "authenticity decision" is a comparison between an instance at hand with some specific prior instance associated with a specific historical event — a comparison based on (approximate) integrity and true provenance. To be authentic a digital document must be whole and undisturbed — that's integrity [10]. It must have originated with the purported author as part of the purported event — that's true provenance [11].

Authentic is simplest for digital transmissions, more difficult for analog transmissions, even more difficult for transmissions of artifacts (a manuscript on paper, a Louis XV chair), and problematic indeed when we speak of living things — so problematic that we do not usually use "authentic" to describe them, but choose other words that have similar meanings. We discuss careful technical meanings in §2.4.

2.2 Questions and Criteria

The manager of a digital library or archive encounters many questions. Before attempting answers, it is prudent to separate questions that are distinct [12], such as:

  • What do we mean by "authentic"?
  • What do we mean by "evidence for authenticity" [13]?
  • What kinds of authenticity evidence might be available for something at hand [4]?
  • How can information producers create such evidence that will be useful long in the future [4]?
  • How can such evidence be preserved until it is wanted [4]?
  • To what extent should a consumer trust a document received and supporting evidence [14]?
  • Is the authenticity evidence sufficient for the application in question [15]?
  • How might evidence be different for different information genres (e.g., performances, reviews, written music) and different information representations (e.g., on musty paper, photocopy, digital representation, printed copy generated from a digital representation ...)?
  • For a particular genre, what are the criteria of authenticity, and how can we test them?

What makes the literature about authenticity confusing is that it often fails to declare which among these and further questions it is addressing at each point, and that it makes unannounced shifts from one question to another. We want the answers to each question to be independent of the answers to other questions — even closely related questions.

We further find it helpful to be explicit about criteria for workable definitions of authentic. We may need similarly careful definitions of the words for other quality measures — words that might include useful, essential, secure, legal, and so on [16]. With this in mind, we choose the following criteria:

  • We want to distinguish as clearly as possible between objective facts and subjective judgments [17].
  • Within the work represented by [Gladney 1], we require that any word denoting a quality allows for objective evaluation of technical solutions relative to explicit requirements statements.
  • Authentic should be binary — either true or false (for any entity compared to some prior entity) [18].
  • The meaning of authentic should depend as little as possible on the kind of entity in question.
  • The definition for digital objects should exhibit minimal discontinuity with existing tradition [19].
  • Whether an entity instance is or is not authentic should not depend on the intention of any human being — not its producer, not any custodian, and not its eventual users [20].
  • The meanings of words used within a single conversation about qualities should not intersect.

A problem with the relationships that Figure 3(a) suggests is that, if you say, "X is authentic", I might infer that X was also useful for whatever we were talking about. In contrast, Figure 3(b) suggests that authenticity is made up of integrity and provenance qualities, that a useful object might not be authentic, and that an authentic object might not be useful.

Possible relationships of meanings

Figure 3: Possible relationships of meanings; (a) is unsatisfactory, and (b) is useful.

2.3 The Fundamental Communication Limitation

Paradoxically, no producer can ever show a consumer the object of greatest mutual interest, his conceptual object 0 [5]. He can share only one or more pictures 1 representing it. Figure 4 depicts the same transmission as Figure 2 except that the people are close enough so that no transmission machinery is needed or used. In this case alone, 1 and 9 denote one and the same object — 1triple line9. The picture 1 might be deficient, omitting portions of 0. Even if it is chosen well enough to include the entire 0 pattern alluded to in §2.4, it will contain accidental information that the consumer has no certain way of distinguishing from what the producer considers essential [21]. Thus, the consumer neither has, nor can be given, a sure prescription for constructing his own conceptual object 10 to be identical to 0. Communication is always imperfect in this way, whether it is mediated by machinery or not.

We call these inevitable and troublesome transformations — 0arrow1 and 9arrow10subjective.

In vernacular usage, "authentic" is used to describe quite different entity classes in many different social circumstances. This suggests that people share an underlying notional base. That the examples in §2.4.5 (physical artifacts) and §2.4.6 (natural objects) conform to the definition devised for different kinds of signal further suggests that the current article expresses a useful definition. The definition, whose pieces follow in §2.4.1 through §2.4.6, and are collected in §3.0, is independent of the transmission method and of any machinery.

Communication without machine assistance

Figure 4: Communication without machine assistance (cf. Figure 2, whose intermediate objects 1 and 9 are now the same object).

Nothing in this article is intended to say anything scientific (facts about the world) or prescriptive (how anyone should behave). Instead, we engage in a language game [22] to learn and show a useful way of describing abstractions such as "evidence", "essential", and "secure". The language game is based on analyzing object transmissions into pieces whose potential imperfections we can describe.

2.4 Authenticity for Various Entity Classes

Any question about authenticity compares some here-and-now entity with something that existed earlier and possibly elsewhere. For digital objects and analog signals — information transmissions — the question is always about the authenticity of a copy. For physical (material) objects, the question is usually about comparison with some prior state of that object [7]. Books and performance recordings are somehow intermediate; for them we often distinguish between the information written onto some substrate and the material substrate itself.

The prior instance must be identified in a provenance assertion. This often is understood from a shared social context, but depending on tacit understanding may cause errors with undesirable consequences [23].

For digital objects, perfect copies are possible and often wanted. For every other kind of transmission, imperfections are inevitable. In everyday usage of authentic, social conventions govern what is considered acceptable imperfection (a subjective opinion) for each object class. This understanding is often tacit. However, we should be explicit about at least part of the deviation from perfection whenever misunderstandings might cause serious error, damage, or strife.

In any careful discussion of any kind of transmission, we encounter abstractions of what is conveyed through space and time. Copyright law shows a way of thinking about such abstractions by asking, "What is it that a copyright protects?" We find no better answer than that of [Nimmer], who analyzes a moot case about a work whose last copy is destroyed by fire. If a work has once existed in tangible form, copyright protects the abstract pattern that appeared in that fixed copy. In copyright law, what is essential about a work is a pattern inherent in its reproductive instances [24].

This and the difficulties inherent in conveying concepts [25] (see [Wittgenstein] and the §2.0 and §2.3 comments about 0 and 10) make discussing abstractions necessary. Consider a mission of national archives — preserving records of legislation. What is of most interest about the record of a 1950 law is the expression of that law that was inscribed on the particular copy delivered to the repository in 1950. The 1950 physical carrier (paper) is valuable today only if the procedures and records of the repository make it unlikely that the inscription was tampered with, and is usually limited to the evidentiary role of that paper. The paper and the archive exist in order to provide evidence for the conceptual abstraction [25] — the pattern inherent in a reproductive instance. That this notion is very old we see from:

" ... for Socrates (as for Plato) written forms are pale shadows of their human counterparts. They may speak, but they are incapable of dialogue, ...This is true enough. But it fails to get at what is most extraordinary about written forms. For it is exactly in their ability to ensure the repeatability of their speak that they are most powerful. The brilliance of writing — of creating communicative symbols — is ... a way to make things speak, coupled with the ability to ensure the repeatability of that speak." [Levy 99]

2.4.1 Authentic for Digital Objects [26]

Given a derivation statement R, "V is a copy of Y ( V=C(Y) )",
a provenance statement S, "X said or created Y as part of event Z", and
a copy function, "C is the identity function I",
we say that V has integrity compared to Y if V is identical (bit-by-bit equal) to Y, i.e., V=I(Y).
We say that "by X as part of event Z" is a true provenance of V if R and S are true.
We say that V is an authentic copy of Y if it has such integrity and true provenance.

We might use the same definition for other genres than digital, but if we did so, we would find that few copies are authentic because the transformation C is usually not the identity function I.

"Essential" does not occur in the definition because what's essential is a human subjective opinion [20] — either that of each document's producer or that of some later critic. "Useful" does not occur because what's useful depends on what application is intended and on the human consumer's judgment.

We use mathematical language above because doing so helps us to communicate the attributes of authenticity succinctly, precisely, and unambiguously. However, for readers who prefer natural language, the next paragraphs repeat the mathematical form in ordinary English.

The derivation statement R says that some object V is a copy of some object Y. It further states that V is related to Y by some "copy" function C(y) in "V=C(Y)", but does not in itself say what we mean by C(y).

The provenance statement S says that there was some person X who created some object Y in some historical event Z. The practical effect of this statement is that we are not willing to accept that the object V is authentic without explicitly identifying the past existence Y of V, the identity of the creator of Y, and a specific historical event Z within which this creation occurred [27].

We next complete the derivation statement R by saying that (for the case of digital objects) the copy function C(y) is the identity function I(y). What is conventionally meant by an identity function is a transformation whose output is in every respect indistinguishable from its input. Given our digital object V, what this means is that its length and bit pattern are equal to those of the past digital object Y.

These three statements are nothing more than definitions of what we mean, in the current conversation, by V, Y, C, X, and Z. The rest of the mathematical language similarly defines what we mean by "integrity", "true provenance", and finally "authentic" for a digital object, viz.,

  • What we mean by having "integrity" is that V is identical to Y.
  • What we mean by having "true provenance" is that the object V is accompanied by a provenance statement S, and that this S is true.
  • What we mean when we say "V is authentic" is that these two prior statements are satisfied.

All this language — both the mathematical form and the ordinary English form — merely says what we mean by "integrity", "true provenance", and "authentic". We re-emphasize that the current article does not address how to test a digital object's authenticity. A way of doing that is proposed in [Gladney 3].

2.4.2 Authentic for Digital Derivatives and for Transformed Text [26]

When a very large digital object is to be transmitted and when channel capacity is expensive, the object can be compressed either faithfully or with information loss if that is thought not to distort important information (so-called lossy compression) [28]. Referring to Figure 2, what makes a compression 4 of 3 faithful is that the transformation 3arrow4 has an inverse 4arrow3* such that 3triple line3*, so that we can provide the consumer with a bitstring identical to the producer's input. Lossy compression is discussed in §2.4.3.

Derivative objects are very often encountered, e.g., in the practice of the law. When an attorney consults case law, his source is usually a reference periodical quoting decisions, rather than an archival copy of decision paperwork. Of course, the reproduction is accompanied by a provenance statement alluding to the original from which it was derived. Furthermore, any account of trial details is itself a derived work — a court reporter's quotation of what was said at the trial. The reporter's account is itself an excerpt constrained by customary rules. The attorney probably trusts [29] what he reads because the reference work has a history of faithful reproductions and because its publisher and the court reporter each have little motivation to mislead, and much to lose if their reputations for veracity and accuracy are impeached.

These examples suggest the utility of a broader notion of provenance than that within the §2.4.1 definition of authentic. For records derived according to precise, objective rules that preserve information, we would replace the copy function in §2.4.1 and write a revised definition as follows [26].

Given a derivation statement R, "V is a copy of Y ( V=C(Y) )",
a provenance statement S, "X said or created Y as part of event Z", and
a copy function C: C(y) = Tn ( ... (T1(y))), with every Tk having an inverse,
we say that V has integrity compared to Y by if V is related to Y by V = C(Y) = Tn( ... (T1(Y))).
We say that "by X as part of event Z" is a true provenance of V if R and S are true.
We say that V is an authentic copy of Y if it has such integrity and true provenance.

It would be a service to consumers who might wish to restore the original representation Y, or any intermediate representation, to include descriptions of the transformations Tk within metadata accompanying the digital object. See §3.0.

The prior paragraph reminds us that, although we call C(y) a "copy function", it might describe a more complicated transformation than people usually understand "copy" to mean. In fact, the definitions above and in §3.0 do not restrict what transformations are allowed [30]. These are users' subjective choices. Anyone may do as he pleases, at the risk that other people do not accept his output object as an authentic derivative of the input if his choice goes beyond social conventions for the circumstances.

2.4.3 Authentic for Analog Signals

For a digital derivative created with lossy compression, and for an analog signal that suffered accidental changes during transmission [31], the notion of integrity in §2.4.2 is inapplicable.

In everyday usage, we say such signals have integrity if their distortions and noise are too small for human perception, or if we think we can distinguish the intended signals from the distortion (as we might do with a 78-rpm wax recording of Enrico Caruso). Such usage depends on opinion and tacit understanding, risking inevitable misunderstandings, as briefly discussed in §2.3.

Whenever such risks are not acceptable, we can describe the imperfections of the received message (9 in Figure 2) relative to what was sent (1 in Figure 2) probabilistically [32]. Our careful definition of authentic would in this case weaken the §2.4.2 copy function, dropping the constraint to reversible changes [26].

Given a derivation statement R, "V is a copy of Y ( V=C(Y) )",
a provenance statement S, "X said or created Y as part of event Z", and
a copy function C: C(y) = Dn( ... (D1(y))), in which some Dk lose information,
we say that V has sufficient integrity relative to Y if V = C(Y) = Dn( ... (D1(Y))), and if C
      conforms to social convention for the genre and circumstances at hand.
We say that "by X as part of event Z" is a true provenance of V if R and S are true.
We say that V is an authentic copy of Y if it has such integrity and true provenance.

Since the transformations Dk potentially add, take away, and alter the original Y, the metadata that is part of each transmitted object should contain descriptions of all the transformations performed, including identifying who is responsible for each transformation. See §3.0.

2.4.4 Authentic for Performances and for Dynamic Objects

To be eligible for copyright or for archiving, information must be fixed. This is conventionally done as:

  1. performance prescriptions, variously called (musical) scores, plays or librettos, mathematical proofs, (computer) programs, (business) procedures, ...;
  1. performances proper, such as audio and video recordings, (computer) logs, (transaction) recovery logs, (business and historical) chronologies, ...; and
  1. snapshots, such as video freeze-frames, machine state snapshots for program debugging, database snapshots, business financial statements, ....

These can be fixed as digital representations, as analog recordings, or by printing on paper. Authentic as defined in §2.4.1 through §2.4.3 is sufficient, illustrating that this definition conforms to long practice. Raymond Leppard, the noted British conductor, uses a brief history of Gluck's Orfeo et Euridice to illustrate the subjective elements inherent in choosing which version of a work is considered "the authoritative original" [Leppard].

This treatment solves published uncertainty about dynamic objects, such as that attributed to Duranti [33]. Over roughly a century, archives have developed opinions about what constitutes "sufficient similarity" to originals for sound and video recordings [34]. For digital objects, we can use without change these opinions about the relationships among the above three rendition types for artistic and other performances.

2.4.5 Authentic for Artifacts

In everyday language, a reproduction of a famous painting, or of a Louis XV chair, is not considered authentic in the sense we have used for signals. Instead, people require that most of the molecules, and their spatial layout, are the same molecules as were in the original many years before [35].

However, people do talk of "an authentic Gucci bag" even though the object at hand is not authentic in the sense they require of a famous painting. Instead, what they require is that it conforms to a certain reproductive pattern (integrity) and that it was manufactured by Gucci employees in a Gucci factory (provenance).

Authentic illustrated by these examples can be seen to have almost the same meaning as that in §2.4.3 if we recognize that we can do no better than comparing today's artifact with another object. The latter could be either a conceptual object that is what we imagine the artifact to have been on some past date, or an information object that represents what the artifact was at that date [7]. With such language, today's Louis XV chair is a "copy" of the Louis XV chair experienced 300 years ago. Nimmer's notion of "the pattern inherent in a reproductive instance" — a conceptual or virtual object [24] — is the right idea.

This is easily seen in cases such as that of the Gucci bag, but can also be understood for the Louis XV chair. For instance, if an auctioneer described damage such as a deep scratch, he might have had in mind comparison with a particular conceptual object — the chair before the damage occurred sometime many years earlier. In 2002, no Louis XV chair was identically the chair that existed in the 18th century, but an auctioneer would not have risked a charge of fraud by calling one "an authentic Louis XV chair".

People who judge whether or not an object is authentic make their own choices of what to use as the comparison standard for the object at hand. For instance, when we are offered a bottle of "salsa autentica" in a California supermarket, we might think of an original Mexican recipe — a conceptual object.

These examples illustrate that the above language is attractive because we can use it to describe objectively how the object at hand differs either from what it might have been earlier, or from a conceptual model of what we might wish it to be. The examples help us understand why people comfortably use "authentic" for very different entity classes. With Nimmer's help, we can articulate (in §3.0) what different notions of authentic have in common [36].

2.4.6 Authentic for Natural Entities

We also often use "authentic" for natural objects that include dinosaur bones, fossils, and gemstones. We are comfortable with this for reasons similar to those just mentioned. A dinosaur bone might contain some of the original molecules. However, fossils contain few original molecules; instead, they embed "the pattern inherent in reproductive instances" [Nimmer]. Just how far we seem to be willing to stretch the concept of authenticity is illustrated by the case of a gemstone, for which it is difficult to describe the creation event; instead, we often talk of the event of its being mined.

How much a natural entity can differ from its prior existence without losing the "authentic" cachet is again a matter of social convention. When a speaker strays beyond what his listener is accustomed to, he might be asked to explain why his use of "authentic" is appropriate. His answer is likely to include some description of the object's deterioration compared to a perfect conceptual object [25] and might add words about its estimated history. We also sometimes talk of authentic pedigree animals [37].

We even use "authentic" in regard to human beings, as in, "He's an authentic Roman." We limit such usage to cases that we believe will not offend and that the listener will understand to be comparisons to a conceptual model, or to examples to which (s)he can relate. In other cases in which we might have used "authentic", we choose different words to express closely related notions. For instance, in a criminal proceeding, a witness might be asked, "Is this the man that assaulted you in March 1998?" and correctly say "yes", even though the man might be much changed in the intervening years. In the same trial, the priestly confessor of the defendant might truthfully say, "This is not the same man as five years ago. He's seen the error of his ways, and reformed!"

Using a popular idiom, we might inquire, "Is this the real McCoy" [1]?

3.0 Our Complete Definition for "Authentic"

For signals and for material entities, we choose the following definition for authentic [38].

Given a derivation statement R, "V is a copy of Y ( V=C(Y) )",
a provenance statement S, "X said or created Y as part of event Z", [27] and
a copy function, "C(y) = Tn( ... (T2(T1(y) ))),"
we say that V is a derivative of Y if V is related to Y according to R.
We say that "by X as part of event Z" is a true provenance of V if R and S are true.
We say that V is sufficiently faithful to Y if C conforms to social conventions for the genre and
      for the circumstances at hand.
We say that V is an authentic copy of Y if it is a sufficiently faithful derivative with true
      provenance.

Here "copy" means either "later instance in a timeline" or "conforming to a specific conceptual object" [25]. Each transformation Tk potentially adds, removes, or alters the information carried by its input signal. To preserve authenticity, the metadata accompanying the input in each transmission step (Figure 1 and Figure 2) should be embedded in the corresponding output by including a description of the transformation in Tk [39]. This is strictly necessary only for steps that alter the information content in a meaningful way [40].

These metadata should identify who is responsible for each Tk choice and all other circumstances important to judgments of authenticity [41]. Suitable metadata schema are being discussed widely, e.g., in the METS initiative. Gladney describes trustworthy packaging for objects and metadata [Gladney 3]. An object's accumulated metadata are the digital equivalent of a traditional audit trail for a physical archival holding.

4.0 Discussion

Authenticity is one among several security properties that need attention. Today we use digital documents mostly for entertainment and efficiency. Many of these documents are copies that we discard, often because they represent hard copy versions. However, in business and other critical applications, we now occasionally depend on a digital document that may be the only copy readily found, and whose misuse might cause damage. Such applications are growing because they are faster, cheaper, and often more reliable than their paper-based counterparts [3].

4.1 Summary

This work was forced by our need for unambiguous language to describe a practical method for long-term digital preservation. Prior published attempts to define authentic for digital objects start with traditional language use, but are unsuccessful. We therefore started by thinking about investigations into the fundamentals of mathematics. Wittgenstein's assertion that language signs are symbols for otherwise hidden meanings [Wittgenstein] led us to regard Nimmer's pattern inherent in reproductive instances [Nimmer] to be a sound basis for every authenticity definition.

Authenticity has grown over many years into a tricky topic — made so by imperfections of transmitted signals and artifacts, and complicated by tacit conventions. Prior authors have suggested that digital object authenticity is a difficult topic [42], often after trying to adapt notions from old and familiar circumstances to digital objects — an unfamiliar case. Unfortunately, this approach also required reasoning from relatively difficult and imprecise cases toward simpler cases and better known circumstances. That the issues for digital objects are simpler than for most other genres is today perhaps not usually recognized because of many centuries experience with artifacts and manuscripts, but little experience with virtual objects. We therefore dealt first with digital authenticity. This provides a base from which extension to traditional definitions is possible.

4.2 About Prior Difficulties

We also wondered why articles we cite, and literature that these articles cite, display uncertainty and confusion. Perhaps this is because the language they use often presumes a single object when it is helpful and correct to talk in terms of multiple objects, with each object being a transformed version of some conceptual object — an abstract pattern [25]. Information is often concurrently represented by several ephemeral physical states — such as patterns of magnetic bits, electronic circuit states, electromagnetic fields in space, and sound waves [43]. Some of these patterns perfectly represent the original information, and are therefore called authentic. Others are imperfect representations, for which we can describe the deviations from perfection. Depending on what kind of object is under discussion, and on human social conventions, we might still call such imperfect manifestations authentic.

Some readers might object to what seems insensitive mechanization of deep cultural distinctions. To do so would be to misunderstand both what we intend and also what we have accomplished. By handling precisely and objectively as much as can be handled objectively, we free ourselves to think carefully about the subjective values and opinions that might motivate such objections.

4.3 Culturally Conditioned Reactions

This article benefited from discussion with two dozen friends and colleagues about what they understood under the word "authentic". We noticed that members of practical professions — engineering, finance, law, and science — appreciated, and were accustomed to, the process of extracting from a definition that is subject to confusion, such as "authentic," the constituent parts that can be treated by objective analysis.

In contrast, people with a liberal arts or social sciences education seemed hesitant to manipulate a socially constructed concept; perhaps they saw doing so as improper tampering. It is, however, useful to divide "authentic" into objective and subjective parts. Perhaps public conversation, possibly a debate, will result in a recognition that nothing is lost by addressing in operational terms those parts that are objective [44]. Inclusion of interpretations that depend on subjective values and opinions can restore the whole in a way that is both technically sound and socially appropriate.

4.4 Evidence in the Provenance and the Copy Functions C(y)

Whether or not the consumer accepts a transmission as authentic will be his/her subjective decision based on weighing the evidence inherent in and accompanying the object — evidence that often extends to context provided by other objects [13]. The provenance definitions above and in §3.0 convey minimal requirements. The producers of provenance information might include more information, such as identification of or links to documents providing evidentiary context. Doing so is often prudent or customary.

In particular, the choice of the copy function C(y) is a subjective decision (§2.4.2). Particularly for an object that cannot be transmitted perfectly, the producer who hopes that the eventual consumer will judge what he receives to be authentic should consider including evidence in C(y). For an object whose history includes several transmission steps, as in §2.4.3, this might be done in each transformation Dk(y).

For a signal that cannot be interpreted without representation information — particularly for a digital object — this extra information might further include the producer creating information to enable consumers' correct interpretation [45] [Gladney 2].

4.5 Prospects

As we consider authenticity of information carried by different media and different representations, we can be optimistic about the digital future, notwithstanding widespread concerns illustrated by:

"In their seminal report of 1996, Waters and Garrett lucidly defined the long-term challenges of preserving digital content. Ironically, the questions and issues that swirl about this new and perplexing activity can be relatively simply characterized as a problem of integrity of the artifact. Unlike paper artifacts such as printed scholarly journals, which are inherently immutable, digital objects such as electronic journals are not only mutable but can also be modified or transformed without generating any evidence of change. It is the mutable nature of digital information objects that represents one of the principal obstacles to the creation of archives for their long-term storage and preservation" [Okerson].

Perfect integrity is feasible for digital objects — for the transmissions from 3 through 6' in Figure 1. We can preserve forever the patterns they express, doing so in language that will never become obsolete. It is feasible to construct objects that carry evidence of their own authenticity [Gladney 1]. These facts give digital records a promising role in archives of the future.

5.0 Conclusions

Smith ends the Council on Library and Information Resources conference report [CLIR 92] with, "We can begin by ... clarifying the terms we borrow from the physical world of analog materials to describe the new phenomena of virtual objects" [Smith]. The current article satisfies part of this call for action and, as far as we know, addresses the uncertainties expressed [42] in recent archives and digital libraries literature regarding a useful meaning for "digital object authenticity".

That "authentic" is used for many kinds of object suggests a single common meaning for authentic [36]. Digital object authenticity is, in fact, simpler than authenticity for other genres because only digital transmissions can be perfect. For every other kind of transmission, the object received is to some extent changed from its comparison standard. Authenticity for any kind of object — a derived work, an analog signal, a performance, an artifact, or even a natural object — is most easily defined in terms of imperfections relative to what is possible for digital objects.

Although the current article is limited to defining authentic, it provides a methodological framework and implicitly suggests (§2.3) a starting point for precise definition of other object qualities. Language that distinguishes intermediate objects in any information transmission (Figure 1 and Figure 2) is our framework.

The distinctions we make separate purely objective transmission steps from steps including information transformations that depend on subjective values and opinions. If the parameters of a subjectively chosen transformation are documented, the nature of the transformation becomes an objective fact that is part of the item's provenance. Producers necessarily exercise subjective judgment when they create transformation functions Dk(y) and have the opportunity for extending these with evidence (§4.4). Consumers must judge evidence for authenticity, exercising judgment that depends on social conventions for the object genre being considered. What we have written does not pretend to comment on subjective factors beyond pointing out their occurrences.

Discussions of subjective factors will surely occur for digital objects more or less as they have occurred traditionally for other kinds of object. However, by handling precisely and objectively what can be handled objectively, we free ourselves to think carefully about the subjective values affecting decisions about the authenticity and trustworthiness of artifacts and signals.

We have elsewhere described a practical method for demonstrably authentic digital preservation [Gladney 3]. The authors would be pleased to hear substantial disagreements and criticisms, and will attempt to answer any they receive.

Acknowledgements

We acknowledge with gratitude several score conversations with friends and professional colleagues.

As they have for our other writings, John Swinden and Peter Lucas read drafts carefully and provided insightful comments.

Bibliography

[Bearman] David Bearman and Jennifer Trant, "Authenticity of Digital Resources: Towards a Statement of Requirements in the Research Process," D-Lib Magazine, June 1998. Available <doi:10.1045/june98-bearman>.

[Carroll] Lewis Carroll with original illustrations by John Tenniel and introduction and notes by Martin Gardner, The annotated Alice: Alice's adventures in Wonderland & Through the Looking-glass, The definitive ed., New York, Norton, 2000. ISBN 0-393-04847-0.

[CLIR 92] C.T. Cullen et al., Authenticity in a Digital Environment, Council on Library and Information Resources Report pub92, 2000. ISBN 1-887334-77-7. Available <http://www.clir.org/pubs/reports/pub92/contents.html>.

[Levy]    "Where's Waldo? Reflections on Copies and Authenticity in a Digital Environment," by David M. Levy, <http://www.clir.org/pubs/reports/pub92/levy.html>.

[Lynch]    "Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust," by Clifford Lynch, <http://www.clir.org/pubs/reports/pub92/lynch.html>.

[Smith]    "Authenticity in Perspective," by Abby Smith, <http://www.clir.org/pubs/reports/pub92/smith.html>.

[Duranti] Luciana Duranti, Diplomatics: new uses for an old science, Scarecrow Press, Lanham, Md., 1998. ISBN 0-810-83528-2

[Gladney 1] H.M. Gladney, Trustworthy 100-Year Digital Documents: Executive Summary of a Digital Preservation Proposal, to be available 3Q03.

[Gladney 2] H.M. Gladney, Trustworthy 100-Year Digital Documents: Durable Encoding for When It's Too Late to Ask, July 2003. Not yet published, but available by way of an e-mail request to the author.

[Gladney 3] H.M. Gladney, Trustworthy 100-Year Digital Documents: Evidence Even After Every Witness is Dead, February 2003. Not yet published, but available by way of an e-mail request to the author.

[Gladney 4] H.M. Gladney, Trustworthy 100-Year Digital Documents: Syntax and Semantics-Tension between Facts and Values, planned for August 2003.

[Leppard] Raymond Leppard, Authenticity in music, Faber Music, 1988. ISBN 0-571-10088-0.

[Levy 99] David M. Levy, "The Universe is Expanding: Reflections on the Social (and Cosmic) Significance of Documents in a Digital Age," Bull. Am. Society for Information Science 25(4), 17-20, 1999.

[Lynch 01] Clifford A. Lynch, "When Documents Deceive: Trust and Provenance as New Factors for Information Retrieval in a Tangled Web," J. Am. Soc. Info. Sci. 52(1), 12-17, 2001.

[MacNeil] Heather MacNeil, "Providing Grounds for Trust: Developing Conceptual Requirements for the Long-Term Preservation of Authentic Electronic Records," Archivaria 50, 53-76, Autumn 2000.

[Nimmer] David Nimmer, "Adams and Bits: of Jewish Kings and Copyrights," 71 S. Cal. L. Rev. 219-245, 1998. Also in the Copyright Society Journal 46(2), 1998.

[Okerson] A. Okerson et al., YEA: The Yale Electronic Archive, One Year of Progress, 2002.

[Ross] Seamus Ross, "Position Paper on integrity and authenticity of digital cultural heritage objects," DigiCULT Thematic Issue 1, August 2002. In <http://www.digicult.info/downloads/thematic_issue_1_final.pdf>, pp 6 - 8.

[Steemson] Michael Steemson, "Digital Experts Search for E-Archive Permanence: Summary of [a] Forum[:] Integrity and Authenticity of Digital Cultural Heritage Objects," DigiCULT Thematic Issue 1, 2002. In <http://www.digicult.info/downloads/thematic_issue_1_final.pdf>, p. 10.

[Thibodeau] Kenneth Thibodeau, "Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years," in Council on Library and Information Resources, The State of Digital preservation: An International Perspective, April 2002. ISBN 1-887334-92-0.

[Wittgenstein] Ludwig Wittgenstein, (trans. D.F. Pears and B.F. McGuiness), Tractatus Logico-Philosophicus, introduced by Bertrand Russell, Routledge, London, (1921 and 1997). ISBN 0-415-02825-6

Also Cora Diamond, ed., Wittgenstein's Lectures on the Foundations of Mathematics, Cambridge, 1939, University of Chicago Press, 1976. ISBN 0-226-90426-1.

Notes

Copyright © H.M Gladney and J.L. Bennett
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next Article
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

DOI: 10.1045/july2003-gladney