Stories

D-Lib Magazine
February 1998

ISSN 1082-9873

Implementing Policies for Access Management


William Yeo Arms
Corporation for National Research Initiatives
Reston, Virginia
warms@cnri.reston.va.us

Contents


1. The access management problem

Managing access to on-line information is a broad problem, which occurs in a wide range of different applications. Managers of on-line information wish to implement policies about who can access the information, under what terms and conditions. This paper describes a general approach to this problem and experience in applying it in digital libraries.

Examples of areas where access management is needed include the following:

Digital libraries
Libraries often need to restrict access to parts of their collections for various reasons, including restrictions imposed by donors, concerns about privacy or obscenity, licensing arrangements, and other agreements with copyright owners.

Electronic publications
The most common reason that publishers and other copyright owners wish to manage access is because they require payment for use of materials, but there are other reasons, such as preventing the spread of unapproved derivatives.

Security, classification, and trade secrets
Governments sometimes classify information in order to control access for security reasons. Commercial organizations use similar methods to protect confidential information and trade secrets.

Medical records
Medical information is usually kept confidential, except to people who can demonstrate that they have a need to know.

The approach that is described in this paper is equally relevant to these and other applications areas, though implementation details differ. The various application areas use different terminology to describe the questions of controlling access to on-line information. In publishing, the term "rights management" is often used, but this term is very narrow in scope. The phrase "terms and conditions" is also widely used. The term "access management", which this paper uses, is a broad term that applies in all areas.

Independent of terminology, it is important to view access management from the perspective of the manager of the information. In a digital library, information is obtained from a variety of sources, for example by license from a publisher, or by a donation with restrictions on use. In drawing up its access management policies, the library will reflect agreements with publishers and other third parties. There may also be relevant laws that must be embodied in the policies. However, with digital materials as with physical items, the library is responsible for managing access to its collections.

Access management has been a topic of digital library research for several years. The general area was a theme of CNRI's work for the Computer Science Technical Reports project.[1] An updated version of these concepts is included in a recent paper from the Cross-Industry Working Team.[2] Another overview of the field is the report of the National Science Foundation workshop, in March 1996.[ 3]


2. The general model

This paper describes a conceptual model and a specific implementation, which is being tested in a number of prototype applications. One application, which is described in this paper, is CNRI's work with the Library of Congress's National Digital Library Program. Other aspects of this work were described in a paper in D-Lib Magazine, February 1997.[4] We are carrying out another study with the Defense Technical Information Center to confirm that the approach also fits their needs.[5] In December 1997, the concepts were introduced in London at a meeting of the STM Innovations Committee, a group of leading scientific publishers.

Figure 1 shows the general access model.

Figure 1. The access model

The central concept of the framework is that access is controlled by policies. Each policy relates some group of users with some set of digital material and permits or denies certain types of operations on the material. Policies require information about the users, which is provided in the form of roles, and about the digital material, which is provided in the form of attributes.

A key feature of this model is that the evaluation of roles, attributes, and policies takes place dynamically. The association of a policy with a user and specific material does not occur until the user wishes to access the material. Many attributes are expressed as metadata, which is stored with the material in the repository, but attributes may be obtained in real-time, perhaps by executing a program. User roles are often established by authentication, which can take place at any time.

Some other approaches to access management encode access policies with each item of material, so that each item is labeled with all the information that is required to manage access to it. Any policy change requires processing every item in the collection. In a large collection, this is a major task, prone to error. By keeping the policies separately, changes of policy can be reflected without alteration to the stored digital material.

Alternative approaches

A recent paper by from IBM describes an approach to access management using secure containers of information.[6] In this approach, all information about some digital item is encapsulated in an encrypted container. Using the terminology of this paper, the container includes the digital material, its attributes, the policies, and the code necessary to enable the operations. The container can be stored in a repository, transmitted over a network, or even transferred to external media such as CD-ROM. These containers can be used to enforce the access policies, including control over subsequent use.

Mark Stefik of Xerox has described how the concept of trusted systems can be expanded into an architecture for access management that also emphasizes technical enforcement.[7] Stefik has designed a formal language for representing terms and conditions that can be used for interoperation between computer systems.


3. An illustration and a prototype example

An illustration

To illustrate the model, here is a simple example. A university library has the following loan policies:

  • The collections are divided into three categories: the general collection, a reserve collection, and reference materials.
  • Reference materials are for use in the library only.
  • The faculty loan periods are: general collection - 12 weeks, reserve - 2 weeks.
  • Student loan periods are: general collection - 2 weeks, reserve - use in library only.

These policies can be formalized as follows. A user can have one of two roles:

Role
Faculty
Student

The material in the library is divided into the following categories, which are marked with the corresponding attributes:

Attribute
General
Reserve
Reference

The operations that are allowed on the collections are:

OperationDescription
Loan-1212 week loan period
Loan-22 week loan period
In-libraryUse within library only

The library's policies can now be expressed as a policy table:

Policy Table
RoleAttributeOperation
FacultyGeneralLoan-12
FacultyReserveLoan-2
StudentGeneralLoan-2
StudentReserveIn-library
Faculty or StudentReferenceIn-library

Suppose, now, that the library decides to be egalitarian, by treating faculty and students equally. The loan period for the general collection becomes 12 weeks for everybody. Materials in the reserve and reference collections can be used only in the library.

The new policies can be formalized as follows. The lists of roles, attributes, and operations do not require any changes. The policy table becomes:

Policy Table
RoleAttributeOperation
Faculty or StudentGeneralLoan-12
Faculty or StudentReserve or ReferenceIn-library

A prototype example

The access model is currently being evaluated with a collection of historical materials from the Library of Congress. The collection, Prosperity and Thrift: The Coolidge Era and the Consumer Economy, 1920-1929, known as the Coolidge-Consumerism Collection, includes materials digitized from the library's general and special collections: digitized books and magazines, folders of documents from various manuscript collections, still photographs, motion pictures, and sound recordings. Some, but not all, are no longer protected by copyright. Some are published, some unpublished. The Library of Congress wishes to make its collections available as widely as possible, but only when this does not violate the wishes of copyright holders or donors, and the privacy of people mentioned in the materials.

An important attribute of this kind of library material is that the library may be unable to identify all the rights and restrictions that might apply to some items. Carrying out a full search for access restrictions that might possibly apply to every item is expensive and time consuming. Even after a thorough search, vital information may still be missing or undiscoverable. For example, there was no copyright time limit for unpublished items, like manuscripts or archival photographs, created during the period covered by the Coolidge-Consumerism Collection. Difficulties arise even for older published materials: Was the work registered for copyright? Who is the owner? Were rights transferred? Additionally, there may be other considerations that might lead to a need to restrict access: privacy rights, publicity rights, and restrictions placed on material at the time of gift or sale to the library.

To deal with this uncertainty, the Library of Congress has applied the concept of risk assessment. A set of materials is reviewed for general characteristics, such as date, content, and terms under which the material was acquired by the library. After this broad review some materials can be assigned to a "low risk" category, material that can be made openly available with little risk of copyright infringement or other legal concerns. Other materials are placed in more restrictive categories with procedures for appropriate use of each category.

In the access management system, the category is encoded as an attribute of the material, and the procedures for access to that category as entries in the policy table. The access management system has to provide for flexibility. From time to time, materials will be moved from one category to another, such as when copyrighted works enter the public domain; this is achieved by changing the attribute. On occasion the library may change its policies for access to a specific category of material; this requires changes to the policy table.


4. User interface issues

The next four sections expand the general access model introduced in Figure 1 and describe how the access management model has been used for the Coolidge-Consumerism Collection. First, however, here are a few comments about user interfaces for access management.

All methods of access management tend to be intrusive and interfere with the user's interaction with the library collections. The worst situation occurs when every interaction requires some sort of authentication or authorization for payment. Almost as bad are those situations where the user has different passwords and procedures for many services. User interface design is particularly tricky if payment is involved. There are two extremes, both undesirable: providing the user with no feedback about the rate of expenditure, or continually prompting for the expenditure of tiny sums of money. The least intrusive situation is when authentication is keyed to some hidden information, such as the IP address of the user's computer, or where the user logs in once and an authentication token is passed on to other computers, as in the campus-wide Kerberos systems that some universities have implemented.

These user interface problems are critical, although there is no single solution. Awkward user interfaces have been a significant reason why the field of access management has moved more slowly than might be hoped.


5. Users and roles

A user is a person using a computer system who wishes to access the digital material, or the computer system itself. Characteristics of users are encoded as roles, for example:

  • The user is a subscriber to all ACM publications.
  • The user is a high school student.
  • The user is physically located within the Library of Congress.

Authentication and payment are methods by which a user's role can be established. A user who has been formally authenticated has a different role from a user who has not been authenticated. In a similar way, if a user makes a payment, it establishes a role. This is shown in Figure 2.

Figure 2. Authentication and payment

One of the annoying problems in access management is the paucity of effective technology to authenticate users. An ideal system of authentication would be easy to administer, unobtrusive to the user, and robust against tampering. Most commonly used methods fail on several of these criteria. Two widely used methods are: (a) to authenticate using an ID and password, and (b) to use the IP address of the user's computer for identification. Beyond these two simple methods of authentication there are few generally available methods of authentication that are easy to administer and convenient for the user. This is a major gap that inhibits the development of good access management systems.

Roles for the Coolidge-Consumerism Collection

Policies that apply to the Coolidge-Consumerism Collection use the following roles, with the corresponding methods of authentication.

RoleDescriptionAuthentication
AllAny usernone
EducationalHolder of educational site licenseID and password
In_LCWithin LC reading roomIP address
LC_staffLC staff memberID and password

This table contains one entry which is still tentative. For some materials, donors or other rights-holders are unwilling to grant universal access, but happy to allow use for educational purposes. The Library of Congress is considering the use of an educational site-license. There would be no cost associated with the license, but the licensee would agree that certain materials would be used for educational purposes only.


6. Attributes of digital material

Digital material is information that users may wish to access subject to policies. Properties of the digital material that are important for access are called attributes, e.g.:

  • Unpublished work.
  • Published work, registered for copyright on 1/1/1996.
  • French government publication.
  • Letter from donor, dated 1/1/1893, states "I give my papers to the Library of Congress and hereby dedicate to the public all rights in my writings."

In general, roles and attributes can be provided in a variety of ways. Many attributes are encoded as metadata, but others can be computed when required. For example, a policy might apply only to digital objects that occupy less than 10 megabytes. The size could be stored as metadata or measured when required. Some attributes depend upon a date or range of dates. For example, in the United States, the copyright status of unpublished materials that existed in 1978 will change in 2003.

Attributes can be associated with individual digital objects or with sets of objects. Sometimes, within a digital object, different attributes may be associated with different elements of the object. A common example is that text and pictures may have different attributes, as shown in Figure 3.

Figure 3. Granularity

Attributes for the Coolidge-Consumerism Collection

For the Coolidge-Consumerism Collection, attribute metadata is being encoded for all digital objects, based on the following list of attributes:

Attribute codeDescription
PEPublished or registered work for which copyright has Expired
DUDeed of gift from donor permits Universal access
DRDeed of gift from donor Restricts access
LULibrary of Congress review reveals no extant right, Universal access permitted
LLLibrary of Congress review of item for which information is incomplete indicates Low risk profile
LHLibrary of Congress review of item for which information is incomplete indicates High risk profile
CNCopyright material -- permission Not sought
CDCopyright material -- permission Denied by owner
CGCopyright material -- permission Granted by owner

One question that frequently arises in access management is whether it is possible to agree on a standard set of attributes that will serve many different applications. This is a highly desirable objective for interoperability among different digital libraries, but this example shows that it is a difficult challenge. The choice of attributes works well for the Library of Congress's special collections, but it is not certain whether other libraries will manage their collections in the same manner. The general problem of encoding and parsing sets of attributes is complex and beyond the scope of this paper.


7. Operations

Operations are actions that a user may take to access digital material, e.g.:

  • Replicate from one computer to another.
  • Render an image on a screen.
  • Extract 2 minutes from a video program.
  • Create a derivative work.
  • Perform in public for profit.

Some operations can be precisely described technically, such as the operation of executing a specific computer program with the digital material as input. However, it is a characteristic of access management that many operations are difficult to define precisely, particularly those that specify the purpose for which an operation is carried out, such as "for profit".

Technical methods of enforcement

The term "enforcement" describes methods to ensure that the specified operations are the only actions carried out on digital material. As shown in Figure 4, some methods of enforcement are based on technical methods, such as encryption.

Figure 4. Enforcement by encryption

It is important to recognize that policies need not be enforced by technical means. Enforcement may be:

  • technical (e.g., encryption)
  • legal (e.g., damages for violation)
  • contractual (e.g., revocation of license)
  • institutional (e.g., loss of library privileges)

In the long term, perhaps the most effective form of enforcement is wide-spread user education, supported by institutional policies and the development of a set of well understood norms for reasonable behavior. For this reason, one useful operation is to display an access statement. As shown in Figure 5, an access statement is a statement of policies associated with specific digital material which is displayed prominently when the material is presented to a user. By itself an access statement does nothing to enforce any policies, but it educates users so that they know what is expected of them. It may also have some legal significance.

Figure 5. An access statement

Subsequent use

Access management policies frequently restrict the subsequent use that a user may make of digital material, e.g.:

  • No redistribution without attribution.
  • Display on screen, but do not print.
  • Use on a specified computer only.

Strict control of subsequent use by technical methods is rarely possible without great inconvenience. It is necessary to rely on education of users, which may be backed by threats of sanctions.

Operations for the Coolidge-Consumerism Collection

The prototype Coolidge-Consumerism Collection has a very simple set of operations. Several access statements have been specified, for example, "Warning. For copyright reasons, this material is available only for access by Library of Congress staff." The following table lists all the operations.

OperationDescription
GeneralAny operation
Statement-1Display access statement 1
Statement-2Display access statement 2
Statement-3Display access statement 3

8. Policies

Policies are at the heart of this architecture for access management. A policy permits users, with certain roles, to carry out operations on digital material. The verbal statement of a policy may be quite informal, e.g.:

  • Access to subscribers only.
  • May be used for any non-commercial purposes.
  • Prints may be made at $1 per print.
  • For use only within the Library of Congress.

To formalize policies, they can be represented as logical expressions of the form:

    If (role) and (attribute) then (operation)

This simple expression hides a great deal of complexity. In general, the role and attribute can be the result of executing a computer program. They can both be arbitrarily complex Boolean expressions, including the special cases "all" or "none". The operation can be a set of operations, which may form logical expressions.

In general, a single item of digital material may be subject to several policies. These policies may intersect in complex ways. For example, a user might have the choice of a restricted set of operations with no payment or a broader set with payment. Care must be taken to ensure that policies do not contradict each other. Reconciling such complexities creates a need for negotiation among policies. This leads to interesting questions of user interface design, since the user should be kept informed of decisions but should not be subject to a continuous barrage of messages and questions when the responses are trivial.

In simple cases, policies can be described as rows in a policy table. This is reasonably simple to implement and yet provides sufficient flexibility for many situations.

Policy table for the Coolidge-Consumerism Collection

The policy table for the Coolidge-Consumerism Collection is:

Policy Table
RoleAttributeOperation
AllPE or DU or LU or LLGeneral
EducationalDRGeneral
In_LCLHGeneral and Statement-1
LC_staffCN or CDGeneral and Statement-2
AllCGGeneral and Statement-3

This is a comparatively straightforward table, but has been simplified in one aspect. The attribute DR, for material with donor restrictions, actually has variants depending on the specific restrictions. The table above shows how a restriction to holders of educational licenses might be coded.


9. Implementation

Access management has been implemented for the repository system.[4] This repository provides a distributed object interface to stored digital objects. Whenever a client attempts an operation on a digital object in the repository, there is an explicit check to see if the policies permit that operation. This step is usually hidden from the user, but may stimulate a request for authentication, or trigger a special operation, such as the display of an access statement.

The access management part of the repository design includes the following features.

User interface support

To minimize the intrusion of access management on the user, several concepts have been implemented:

Authentication of users

Intially, the methods of user authentication supported are:

Authentication with user ID and password is implemented as a Java applet that is downloaded from the repository to the client. It prompts the user for password, encrypts it, and returns it to the repository. Future plans call for a wide variety of applets, corresponding to the standard methods of authentication, such as public/private key authentication, Kerberos tokens, etc.

Granularity

Attibutes are attached to digital material at three levels of granularity:

Within these three levels, access management can be associated with any level of digital material, even down to the byte level, or to sets of digital objects.

Inheritance

Some roles and attributes may be refinements of other, more generic ones. For example, in the Coolidge-Consumerism Collection the "Educational" and "LC_staff" roles are narrower instances of "All". If some material is accessible to all users, then it is also accessible to educational users and to Library of Congress staff. By formalizing the inheritance of roles, policy tables can be much simplified, but the logic for interpreting policies becomes more complex.

Operations

For the first stage, three basic operations have been implemented:

These basic operations will be extended in the next stage of the implementation. In particular, work on the operation display access statement is currently under way for use with the Coolidge-Consumerism Collection.

An example of the use of policies

Here is a hypothetical example, taken from the Coolidge-Consumerism Collection, that shows how these various components fit together.

A user wishes to access the digitized image of a photograph, stored in the repository. Initially, the temporary record that lists the user's roles is empty. The repository interface examines the digital object and finds, for example, that it has the attribute "DR". The next step is to look for rows in the policy table that apply to this material. There is one such row:

RoleAttributeOperation
EducationalDRGeneral

The user does not have the role "Educational". Examination of the role table indicates that this role is established by authentication with a user ID and password. The authentication applet is sent to the client, the user types in an ID and password and returns them to the repository where they are checked. Assuming that the authentication succeeds, the role "Educational" is added to the temporary record. The policy now applies and any operation within the category "General" is permitted.

Conclusion

This is only one possible implementation of the access management model shown in Figure 1. The basic concept of the model is to separate policies from the attributes of material and the roles of users. Our experience to date confirms that this separation is straightforward to implement, at least for simple sets of policies, and serves well the requirements of managing large digital libraries.


10. References

[1] Robert Kahn and Robert Wilensky. "A Framework for Distributed Digital Object Services". May 1995. (http://WWW.CNRI.Reston.VA.US/home/cstr/arch/k-w.html)

[2] Cross-Industry Working Team. "Managing Access to Digital Information: An Approach Based on Digital Objects and Stated Operations". May 1997. (http://www.xiwt.org/documents/ManagAccess/ManagAccessTOC.html)

[3] James R. Davis and Judith L. Klavans. "Workshop Report: The Technology of Terms and Conditions". D-Lib Magazine, June 1997. (http://www.dlib.org/dlib/june97/06davis.html)

[4] William Y. Arms, Christophe Blanchi and Edward A. Overly. "An Architecture for Information in Digital Libraries". D-Lib Magazine, February 1997. (http://www.dlib.org/dlib/february97/cnri/02arms1.html)

[5] Defense Virtual Library. (http://www.cnri.reston.va.us/dtic.html)

[6] H.M. Gladney and J.B. Lotspiech. "Safeguarding Digital Library Contents and Users". D-Lib Magazine, May 1997. (http://www.dlib.org/dlib/may97/ibm/05gladney.html)

[7] Mark Stefik. "Trusted Systems". Scientific American, March 1997. (http://www.sciam.com/0397issue/0397stefik.html)


11. Acknowledgements

Many of the ideas in this paper were developed through joint work with members of the Library of Congress, especially Carl Fleischhauer and Melissa Levine of the National Digital Library Program, and Mary Levering of the U.S. Copyright Office, all of whom provided extensive comments on an earlier draft of this paper. The design and implementation of the repository has been carried out by Christophe Blanchi and Ed Overly of CNRI.

This work has been supported in part by the Library of Congress and the Defense Research Programs Agency (DARPA), under DARPA grant MDA972- 92-J-1029.

Copyright © 1998 Corporation for National Research Initiatives

Top | Magazine
Search | Author Index | Title Index | Monthly Issues
Previous Story | Next Story
Comments | E-mail the Editor

hdl:cnri.dlib/february98-arms