D-Lib Magazine
|
|
Kostas Saidis Alex Delis |
IntroductionThe fundamental element upon which modern Digital Libraries (DLs) are built is that of the digital object [19]. Digital objects can be conceived as compound artifacts that wrap digital material in terms of four elements: its content, its metadata, its relationships with other objects and its behavior. It is often the case that material in DLs can be classified in categories whose members present similar, if not identical, characteristics and behavior. Should we consider a DL comprised of "papers" and "photographs", two respective groups of digital objects could be created to cater for the two readily discernable categories of digital material. It is evident here that all objects in a single group share near-identical structure and behavior. For instance, all "paper" objects are expected to share a common metadata structure, contain streams/files of the same format, participate in common relationships, etc. On the other hand, both content and internal representation of digital objects can vary in significant ways across the adopted categories of objects. In this respect, components used to model "photograph" objects can be and frequently are very different from those constituting "papers". From a digital object encoding perspective, various XML formats, including METS [12], MPEG-21 DIDL [1], FOXML [8] and RDF [22] help encode arbitrary digital object variations through the use of a common XML schema. However, these formats are mainly used to address low-level storage and preservation needs. We argue that, for developing full-fledged, integrated DL systems, we should be able to handle variations in digital object parts at a higher level of abstraction. For instance, both "papers" and "photographs" are accompanied by metadata in our DL example. Yet, issues such as which metadata elements are used in each object category, what constraints such elements should satisfy, and how these constraints should be handled do not depend on the underlying storage encoding. Similar issues apply to all elements of a digital object: what are the digital content types and formats used by each object category, in which relationships are the objects of each category allowed to participate, and how does each set of objects behave? These variations, which distinguish diverse sets of objects, are poorly handled by contemporary DL systems, forcing humans manually to process "papers" and "photographs" as such:
Types arise in any domain that calls for categorization of objects according to their usage and behavior [5]: type-less objects naturally decompose into sets with uniform structure and behavior that are then referred to as "types". A type determines the properties we can prove about a group of objects [14], while a class constitutes a realization of a type, providing the functionality and representation to be shared by a group of objects [16]. In the DL field, we need to ascertain that all objects belonging to the same class will contain similar structural attributes and that all objects of the same type will expose uniform behavior. Digital Object Prototypes (DOPs) [20] provide a domain-specific realization of digital object types and classes in the context of DLs. We gather meta-information about categories or sets of digital objects in a digital object prototype definition and treat individual members of these sets as instances of their prototype, making digital objects conform to their respective prototype specification without user intervention. The DOPs framework designates an abstraction over the structural and behavioral variations of diverse sets of digital objects; this abstraction ultimately yields the means for:
In this article, we provide an overview of the framework, we highlight its type-conformance capabilities and we show how heterogeneous digital material can be treated in a uniform manner without resorting to custom developments. Digital Object Prototypes (DOPs)Figure 1 places DOPs in the context of a digital library architecture organized in three tiers; DOPs are located between the DL storage/repository and the application-logic/servicing layer.
A DOP is a detailed yet succinct definition of an object's constituent elements and behavior. As Figure 2 shows, the DOP framework treats digital objects as instances of their respective prototype, while the underlying stored digital objects are treated as serializations of digital object instances. A digital object instance is created using (a) the specifications that designate its prototype and (b) the data loaded from the stored digital object. Digital object instances expose the data found in the stored digital object through the DOPs framework's API, in a representation that abides to the specifications of the prototype. The effect of this is that high-level DL services are released from dealing with digital object structural and behavioral variations, including which parts constitute each kind of object and how each part is handled. Such services now operate on digital object instances, which are "ready-made", type-consistent representations of stored digital objects. For example, high-level services cannot execute invalid operations upon the object at hand, or request or manipulate digital object parts that have not been specified in the respective prototype.
In the DOPs framework, a digital object is comprised of a set of attributes. Each attribute has an identifier that is unique in the digital object in which the attribute is defined. The digital object also has multilingual labels and descriptions that assist the display of the attribute to the user. For example, a DL designer may define a metadata attribute that is named "dc" and labelled "Dublin Core Metadata" in English and a file attribute that is named "web-image" and labelled "Image for WWW display". The role of DOPs is to:
This is achieved by the concept of a digital object attribute specification, a detailed definition of an attribute's characteristics and usage patterns. A digital object prototype is a set of attribute specifications, as illustrated by the UML diagram of Figure 3.
DL designers define prototypes, configuring and customizing the attributes specifications to address the material's requirements. The currently DOPs-supported specifications entail the following. Digital Content
Metadata
Relationships
Behavior
A more comprehensive description of DOP usage, along with numerous "real-life" examples originating from Pergamos [11], the University of Athens DL, has recently appeared in Saidis et al. [20]. In practice, the "mechanics" of the DOPs framework work as follows. The DL designer defines DOPs in terms of XML. XML DOP definitions are loaded at the start-up of the DL system and transformed inside the DOPs API using the representation of the UML diagram depicted in Figure 3. High-level DL services then use the DOPs API to manipulate digital objects by creating digital object instances. The UML diagram shown in Figure 4 provides a simplified view of a digital object instance. Each instance is created via a prototype, and all of the instance's attributes (
Note also that the framework allows the creation of "empty" digital objects ones that do not correspond to an existing stored digital object in order to retain the benefits of typing for object input scenarios. For example, input services such as cataloguing or batch ingestion can create empty "paper" or "photograph" typed digital object instances and then use them to insert information in a type-consistent manner. At the end of the day, high-level services store digital object instances back to the repository. This triggers the serialization of the digital object's attributes. If the instance at hand corresponds to an existing stored digital object, its modified attributes are updated in the underlying storage artifact. If the instance is "empty", a new storage artifact is automatically created and all its attributes are stored therein. Derived Benefits from the Use of DOPsTechnically, the DOPs framework constitutes a domain-specific "type system" for digital objects, tailored to the needs of digital object management and manipulation. The approach of developing a type system that is specific to a particular domain is not uncommon. For example, Bidinger et al. [2] recently proposed a domain-specific type system for component-based message-oriented middleware. The essential features of a type system [10] are the following:
Although the above benefits are gained by the use of type systems in programming languages, similar benefits are derived by the use of the DOPs framework in the context of digital object management and manipulation. The first feature, error prevention and detection, has been discussed in the previous section: services cannot perform operations on digital object instances if these operations are not supported by the DOP assigned to the instance. Besides the automated runtime type-conformance checks offered by the DOPs framework, we have used DOPs definitions to develop an off-line digital object validation process in our private cataloging installation of Pergamos. Such a process reports stored digital objects' structural inconsistencies caused by runtime errors such as hardware failures. In terms of the fourth feature optimization we have developed digital object instance caching capabilities that minimize the instantiation / serialization overhead in the publicly accessible, read-only installation of Pergamos. In the remainder of this section, we focus on the benefits offered by the DOPs framework in terms of digital material modeling/design and DL service implementation/maintenance. We also discuss the object input capabilities of the framework. Digital Object ModelingEach modeling approach depends on the requirements it is called to fulfill and the needs it is called to satisfy. These include end-user requirements and material characteristics, and also involve cataloging, documentation and development costs. For example, an over-detailed modeling approach for a collection developed from scratch may not be possible due to funding restrictions. Moreover, born-digital and digitized material may require different treatment and so on. Since there are many ways to model digital material, DLs should facilitate the representation of diverse sets of digital objects with the highest possible degree of flexibility. In technical terms, this modeling flexibility refers to the ability of the DL to support user-defined classes and types of digital objects. The DL system should not impose any constraints on the structure and behavior of the objects it supports. On the contrary, it should enable designers to define their own digital object classes and types that best suit the requirements of the underlying digital material. We have successfully fulfilled this core requirement, as DOPs can be "viewed" essentially as a meta-modeling tool that enables designers to generate their own digital object models. We should clarify that our approach is not related to modeling notations and frameworks such as UML [15] or ADORA [9]. The latter are general-purpose tools for designing, specifying and modeling software in terms of objects. On the other hand, the DOPs framework is a domain specific "language" for manipulating diverse classes and types of digital material in terms of digital objects. With DOPs, DL designers generate DOPs-based specifications of the structure and behavior of diverse groups of digital material, and at runtime, individual members of these groups are "clothed" to abide by the designer's specifications automatically. We use UML to communicate the design of the DOPs framework itself the software and not to express the prototype definitions the digital object class and/or type specifications which DL designers can generate to model digital material. In terms of modeling digital material, DOPs could be compared to METS Profiles [13], since they both refer to user-generated specifications that designate classes of digital objects. However, METS profiles are primarily intended to promote interoperability and exchange of METS documents, and do not offer instantiation and automatic type-conformance it is the users that have to make individual METS documents conform to their profiles. We also expect that our ongoing work developing DOPs-based inheritance will further enhance the reuse of existing definitions of digital object classes and types, as DOPs-based inheritance will allow the derivation of new prototypes from existing ones. DL Architecture and Implementation of ServicesThe architecture mostly used by traditional DL systems does not contain the in-between DOPs layer shown in Figure 1 and high-level DL services operate directly on stored digital objects. Such an approach is mostly sufficient for the cases of homogeneous digital content where all objects represent material of similar nature. In a DL hosting diverse and heterogeneous material, this approach forces the generation of ad-hoc implementations for handling digital object attribute variations. The main issue here is that knowledge about the variations between types of material has to be encapsulated in each and every DL service, and the latter has to resolve on its own the variations occurring among diverse categories of digital objects. From a software engineering perspective, this "breaks" modularity and separation of concerns [17]. Consequently, implementation of services becomes tangled, thus greatly hampering the maintenance and evolution of the DL system. If a new type of material is to be added in the DL, existing services will have to be modified in order to meet the new constraints and requirements. If the DL is to be augmented with a new service, its implementation will have to be carried out in a redundant fashion, containing repetitive code to manage the variations occurring in the underlying material. The introduction of DOPs not only protects the underlying "raw" representation of digital objects from inappropriate use but also morphs the way objects interact with each other. Service implementations are released from "manually" handling digital object variations. On the contrary, services can use type-consistent digital object instances to manipulate objects in a uniform fashion: first create the digital object instance, then manipulate its attributes using behavior schemes and finally store the instance back to the repository. Cataloging and Digital Object InputOur initial motivation in the development of DOPs has been to both simplify and speed up the digital object input phase. In most new collection development efforts, where not born-digital content needs to be digitized, documented and described from scratch and ingested into the digital library, digital object input refers to the most expensive phase. The cost of the object input phase depends on the amount of material that has to be ingested in the DL and is also greatly affected by the cataloging process supported by the DL system at hand. It is not uncommon for object input to be carried out by scholars and/or experts in the material's domain who are not aware of the underlying technologies in DLs, including XML, metadata element standards or web services. Lack of support for digital object types forces catalogers to handle digital object variations manually. Consider a conventional system hosting our example DL comprised of "papers" and "photographs". Since such a system is not capable of distinguishing "papers" from "photographs" and makings each one behave accordingly, it is the cataloger who must supply objects with correct metadata values, with files of appropriate MIME types, etc. Through DOPs in Pergamos, we developed an effective web-based Cataloging service that manages the inherent diverse nature of our material in a uniform manner. This service not only provides user-friendly means to describe and ingest material, but the service also offers valuable metadata quality assurance. For example, an object is not stored unless it contains appropriate values within mandatory metadata elements. Moreover, since new categories of material are continually introduced to Pergamos in terms of new DOPs, the cataloging service adjusts to the specifications of the new types of digital objects without custom coding. Related WorkIn DL literature, approaches on realizing digital object types are based on the notion of a disseminator [4, 21, 18, 3, 6]. Disseminators constitute bindings of an object's data to executable operations. Disseminator-based approaches offer flexibility in executing diverse digital object behavior, especially in terms of interoperability. A disseminator "knows" the repository-specific representation of digital objects and provides a means to execute typed operations on underlying digital objects and their constituent parts. However, as we have discussed in a previous paper [20], disseminator-based approaches present a number of drawbacks. Approaches such as content types [4] and content models [21] do not provide the means to verify the structure of digital objects and, thus, do not allow for asserting that an object actually contains its proper attributes. As the structural information about digital objects is not explicitly defined here, users have to manually supply objects with appropriate information (metadata, digital content, relationships and behavior). Blanchi and Petrone [3] discuss metadata schema and metadata element instances, resembling our proposed notions of Disseminators need to be defined manually by the object creator at the time of ingestion; in other words, the user has to bind an object's data to respective operations manually during the object's creation. This significantly accumulates object input costs and limits the modifiability and adaptability of the system should we consider the effort required to assign new functionality to digital objects or modify an existing one. Disseminators cannot easily be used for object input and cataloguing purposes e.g., for adding a new datastream in FEDORA. In order to be able to issue disseminator-based functionality on an object, the object has to contain both the data and the binding of the data to respective operations beforehand. aDORe [6] attempts to resolve this issue by using dynamically assigned disseminators to objects based on a knowledge base of disseminator rules and properties. Disseminators and DOPs' behavior schemes essentially share the ability to attach behavior to digital objects. However, these two approaches differ on the mechanics that help realize this ability. Disseminators are based on the assumption that all of an object's data are correct, and that all bindings are in place before the disseminator executes. This requires users to carry out manual, tedious and error-prone digital object "configurations". In this context, disseminator-based approaches provide rather limited options for dealing with digital object type conformance. With DOPs, behavior is defined once and in one place in the prototype and it is automatically attached to digital object instances in a dynamic fashion, as part of the object's instantiation process. Hence, a change in a behavior definition in the prototype, automatically yields instances that honor this modification. Moreover, to high-level services, individual digital objects conform to their structural and behavioral definitions automatically and individual objects can be verified via their prototype for containing proper information and for exposing appropriate behavior. Finally, the distinction between private and public behavior in DOPs not only honors encapsulation but also allows instances to carry out automatically a significant degree of object-pertinent functionality. Closing RemarksThe DOPs framework provides a domain-specific realization of digital object types and classes, tailored to the needs of digital object management and manipulation..The public release of the framework, realized as a Java class library, is available at [23]. Our approach does not explicitly rely upon the digital object framework of Kahn and Wilensky [19] or a particular realization of a digital object repository, since the DOPs framework operates on a different level of abstraction, suited to manipulating digital objects in the context of high-level DL services. The aforementioned public release of the framework operates in a repository-agnostic fashion providing an effective type-consistent abstraction of repository-specific disseminators and digital object representations. We direct our effort next in the implementation of inheritance in DOPs, which will provide support for digital object subtypes and subclasses and will allow designers to derive new types and classes of digital objects from existing ones. DOPs-based inheritance will also supply digital object instances with polymorphism, allowing objects to have multiple types and consequently participate in different behavior execution contexts without structural modifications. AcknowledgementWe thank the anonymous reviewers of this article for their comments, which helped us improve the presentation of this article. References1. J. Bekaert, P. Hochstenbach, and H. V. de Sompel, Using MPEG-21 DIDL to Represent Complex Digital Objects in the Los Alamos National Laboratory Digital Library, D-Lib Magazine, 9(11), November 2003, <doi:10.1045/november2003-bekaert>. 2. P. Bidinger, M. Leclercq, V. Quema, A. Schmitt, and J.-B. Stefani, Dream types: a domain specific type system for component-based message-oriented middleware, SIGSOFT Software Engineering Notes, 31(2):2, 2006. 3. C. Blanchi and J. Petrone, Distributed Interoperable Metadata Registry, D-Lib Magazine, 7(12), December 2001, <doi:10.1045/december2001-blanchi>. 4. C. Blanchi and J. Petrone, An Architecture for Digital Object Typing, White Paper, Corporation for National Research Initiatives (CNRI), <hdl:4263537/4096>. 5. L. Cardelli and P. Wegner, On understanding types, data abstraction, and polymorphism, ACM Computing Surveys, 17(4):471-522, 1985. 6. H. V. de Sompel, J. Bekaert, X. Liu, L. Balakireva, and T. Schwander, aDORe: A Modular, Standards-Based Digital Object Repository, The Computer Journal, 48(5):514-535, 2005. 7. Dublin Core Metadata Initiative, DCMI Metadata Terms, <http://www.dublincore.org/documents/dcmi-terms/>. 8. Fedora Project, Introduction to Fedora Object XML, <http://www.fedora.info/>. 9. M. Glinz, S. Berner, and S. Joos, Object-oriented modeling with ADORA, Journal of Information Systems, 27(6):425-444, 2002. 10. Y. Leontiev, M. T. Ozsu, and D. Szafron, On type systems for object-oriented database programming languages, ACM Computing Surveys, 34(4):409-449, 2002. 11. Libraries Computer Center, University of Athens, Pergamos Digital Library, <http://pergamos.lib.uoa.gr/>. 12. Library of Congress, Metadata Encoding and Transmission Standard (METS), <http://www.loc.gov/standards/mets/>. 13. Library of Congress, Metadata Encoding and Transmission Standard (METS): METS Profiles, <http://www.loc.gov/standards/mets/mets-profiles.html>. 14. B. H. Liskov and J. M. Wing, A behavioral notion of subtyping, ACM Transactions on Programming Languages and Systems, 16(6):1811-1841, 1994. 15. Object Manangement Group, Unified Modeling Language, <http://www.uml.org/>. 16. A. Otis, A reference model for object data management, Computer Standards & Interfaces, 13(1-3):19-32, 1991. 17. D. Parnas, On the criteria to be used in decomposing systems into modules, Communications of the ACM, 15(12):1053-1058, 1972. 18. S. Payette, C. Blanchi, C. Lagoze, and E. A. Overly, Interoperability for Digital Objects and Repositories: The Cornell/CNRI Experiments, D-Lib Magazine, 5(5), May 1999, <doi:10.1045/may99-payette>. 19. R. Kahn and R. Wilensky, A Framework for Distributed Digital Object Services, International Journal on Digital Libraries, 6(2):115-123, 2006, <http://www.springerlink.com/content/0723r55h83067n10/>. 20. K. Saidis, G. Pyrounakis, M. Nikolaidou, and A. Delis, Digital object prototypes: An effective realization of digital object types, In ECDL '06: Proceedings of the 10th European Conference on Digital Libraries, Alicante, Spain, September 2006. 21. T. Staples and R. Wayland and S. Payette, The Fedora Project: An Open-source Digital Object Repository Management System, D-Lib Magazine, 9(4), April 2003. <doi:10.1045/april2003-staples>. 22. World Wide Web Consortium, Resource Description Framework (RDF), <http://www.w3.org/RDF/>. 23. Kostas Saidis, The DOPs Framework, <http://www.dops-framework.net/>. Copyright © 2007 Kostas Saidis and Alex Delis |
|
|
|
Top | Contents | |
| |
D-Lib Magazine Access Terms and Conditions doi:10.1045/may2007-saidis
|