D-Lib Magazine
|
|
Irene Lourdi Christos Papatheodorou Martin Doerr |
AbstractThis article is motivated by the demand for unified access to the wealth of distributed digital cultural collections, allowing users to make queries and discover information about them through integrated processes. Our effort originates from the semantic interoperability perspective and considers CIDOC/CRM as the mediating schema, which integrates in an optimal way the semantics of the collection-level metadata schemas and application profiles. The research reveals the complexity of mapping metadata schemas to ontologies and resolves particular difficulties by presenting the crosswalk between Dublin Core Collections Application Profile and CIDOC/CRM. 1. IntroductionOver the last decade, the Web has been the central forum for data storage and information exchange between different applications that support human activities in many domains, such as cultural heritage, art, business, etc. Cultural heritage digitization projects have increased tremendously, resulting in an abundance of cultural information in digital format. This universe of digital information has become so enormous, and the volume and variety of information so overwhelming, that seeking information within this undivided universe is neither effective and precise nor efficient and economical. Information overload is a major issue that was addressed years ago [Lively 1996; Wilson 1996], but the situation has only become worse. The increasing demand for global access to the wealth of highly distributed, heterogeneous, and dynamic cultural heritage content has led to the requirement for integrating information from multiple sources. Information seekers would understandably appreciate a filtering mechanism that could exclude irrelevant and substandard information resources while at the same time make available high quality and useful items. Research in the area of information accessibility focuses mainly on matters of interoperability and integration, taking into account that even though cultural heritage information comes in various forms, it also shares some similarities and is often semantically related. Information integration has been a dynamic and challenging research area for many years, and recently research interests have been moving toward semantic integration that defines and exploits conceptual representations of the data and their relationships in order to eliminate possible heterogeneities [Cruz 2005]. In the diverse environment of digital sources, Semantic Web technology reveals and makes machine-readable the rich semantics of resources for facilitating information integration. The study described in this article was motivated by the fact that collection-level metadata has been proposed as a powerful tool to manage the overwhelming and uncontrolled increase of digitized material. Functional collections require coherent, rich and comprehensive metadata to assist users in navigating across various levels of the collections to locate any information and resource. The importance of the collection-level descriptions in managing the growing amounts of digital resources, as well as in identifying and discovering them efficiently without any loss of contextual or administrative information, has already been discussed in many research papers [Foulonneau 2005; Renear 2008], but it is becoming more urgent. The need for integration of collection-level information is motivated by the existence of a profusion of metadata schemas. Various organizations have engaged in collection-level description efforts, and these efforts have led to the development of a variety of metadata models, which consequently created a complex environment overall. These models are analytically presented in Section 2.2 of this article. Users have not been able to easily seek information simultaneously from the vast number of collections, due to the lack of an integration mechanism. Although the semantics of collection data models are quite similar and the need for integrating collection data is urgent as a key solution to the problem of sharing information from distributed digital applications and sources, there have been no adequate mapping mechanisms from one schema to another. Our goal is to propose an efficient ontology-based methodology for integrating the wealth of composite cultural heritage information, making it possible for users to retrieve all the resources they need from different collections. The ICOM/CIDOC Documentation Standards Group and the CIDOC Conceptual Reference Model (CRM) Special Interest Group have developed a reference model CIDOC/CRM that would enable the explicit definition of cultural event and object attributes, as well as of their relationships, and therefore could be considered as a mediation schema for the needed integration. In this context, the crosswalk between Dublin Core Collections Application Profile (DCCAP) [Dublin Core Collection Description Task Group] and the CIDOC/CRM reference model is presented, opting for the semantic integration of collection-level descriptions as an efficient mechanism for sharing information between various digital repositories and applications. DCCAP has been chosen from a plethora of collection-level metadata schemas, since DCCAP is a widely known and used metadata schema. The outline of this article is as follows: Section 2 provides an overview of the importance of collection-level description and the metadata schemas that have been developed. Related work concerning information integration is also presented in Section 2. The integration model is presented in Section 3, while in Section 4, a collection description with CIDOC/CRM is analyzed, presenting the entities that correspond to collections. In Section 5 the mapping of DCCAP and CIDOC/CRM is presented, and the problems and some issues about the mapping are discussed. Section 6 provides the article's conclusions. 2. Cultural Heritage Metadata Interoperability2.1 Collection-level descriptionThe potential for digital library growth and the increase in the volume of available digital content have drawn into question the ability of users to navigate large distributed and heterogeneous collections. The term "collection" can have many meanings in terms of the context in which the term is used. Many cultural institutions (libraries, museums, archives) use the term "collection" in different ways according to the institution's management policy and requirements. In the archival world, a collection consists of a group of documents originating from the same source and acquired as a whole, while in a library, a collection is the total amount of library material books, manuscripts, serials, government publications, pamphlets, catalogs, reports, recordings, microfilm, etc. that make up the holdings of that particular library [Lee 2000]. The major function of a traditional library collection is to facilitate information seeking by providing the library's users with convenient access to relevant information resources [Buckland 1992]. On the other hand, in museums collections are made up of the objects that are hosted in the museum and are often differentiated by factors such as the object's creator, the artist's style, the type of artifact or a specific historical period, etc. [Allen 2002]. However, in the digital world the concept of collection has a particular role and therefore needs a fresh examination in comparison to the traditional definition of what constitutes a collection. According to Lagoze and Fielding [1998], a digital collection is a "set of criteria for selecting resources from the broader information space". Currently, an increasing number of digitization projects embrace the collection-level description as a valuable mechanism for information discovery and understanding [Foulonneau 2005]. Collection-level descriptions facilitate information seeking, since they easily and conveniently provide the first level of access to the needed information [Lee 2000]. As remarked by a UKOLN study [Heaney 2000] "collection description serves collection management purposes, particularly discharging institution's curatorial responsibilities". In other words, collection descriptions facilitate the finding and identification of material. They group resources and provide access points that enable users to retrieve information. In the current environment, characterized by metadata heterogeneity and an increasing amount of digital information, collections could work as effective filtering systems that help to reduce data overload and enable users to discover the information they need. Depending on the degree to which a collection is dedicated to certain selection criteria, users can effectively exclude looking for individual documents in collections not related to their topic of interest. On the other side, collections that preserve a common context and order, as it is the case for most archives, constitute an information source as a whole for many research questions a fact which cannot be inferred from item-level indexing. Collection-level metadata represent the inherent and contextual characteristics of a collection and are considered key for accessing resources, despite the differences between the cultural institutions' metadata policies. Collection-level metadata schemas help users more easily to decide whether a particular collection is of interest to them. According to Hakala [2005], collection-level descriptions facilitate an agent to:
2.2 Collection-level metadata schemasThe diversity of institutional management perspectives makes it impossible to create a single descriptive schema that would cover all communities' needs [Gill 2004]. Consequently, a plethora of standards and local community-specific metadata models have been developed, employing different data structures, data content rules and (to some extent) data formats to encode their collections. It is not possible to describe in this article all the existing collection-level metadata standards, and so we will limit our discussion to the most widely known and used ones in the cultural heritage domain. These are: RSLP [Powell], DCCAP [Dublin Core Collection Description Task Group], ANSI/NISO Z39.91 - 200x Collection Description Specification DSFTU [National Information Standards Organization] and EAD [Library of Congress]. Our purpose is not to analyze the characteristics of each metadata schema, since they are widely known in the information cultural society, but rather to show, through a brief description, how many and different they are. However, we will give special emphasis to the DCCAP presentation since the rest of this article deals with DCCAP. RSLP (Research Support Libraries Program): This schema acknowledges that collections may have different meanings in the library, archive and other organizations. It has been used to facilitate cross-domain working and searching collections, serving each collection's management purposes. RSLP contains descriptive information about the collection, information about how to access the collection (including physical access in the case of a library or museum), and the terms and conditions associated with access to the collection and individual items. It should be noted that many RSLP elements have been used by existing metadata schemas, notably the Dublin Core Metadata Element Set [Dublin 2008] and the vCard set of attributes [Dawson 1998]. NISO Z39.91-200x: This standard was developed as part of the activity of the NISO Metasearch Initiative in 2004-2005. The draft standard incorporates the set of Dublin Core Collections Application Profile (DCAAP) as well as elements from other vocabularies. Dublin Core and NISO specifications have been aligned, but the latter has some additional elements such as "Completeness" (subject and level) of collection. EAD (Encoded Archival Description): The EAD standard was created by the Library of Congress (LC) and the Society of American Archivists (SAA), and it is used to encode archival finding aids. It supports archival descriptive standards and practices, and effectively provides the classification of an archival collection. DCCAP (Dublin Core Collections Application Profile): DCCAP is an application profile of the Dublin Core Metadata Initiative (DCMI) for collection-level description; it is considered to be valuable for extracting a set of core data from a collection and for providing users important information about it. A collection can be any aggregation of physical or digital items, while a collection description includes a description of one or more collections (aggregations of items) and a description of zero or more catalogues or indices (resources that describe collections). Additionally, as it is described by the Dublin Core Collection Description Task Group, "it provides a means of creating simple descriptions of collections suitable for a broad range of collections, as well as simple descriptions of catalogues and indexes". DCCAP is based on the Dublin Core element set [Dublin 2008] and other metadata vocabularies, specified by the following namespaces:
Since current work deals with this application profile, its properties are analyzed in Table I, remarking that DCCAP is still under development.
Table I. DCCAP properties 2.3 Obtaining InteroperabilityChallenges associated with semantics, integration of information and presentation of various kinds of data call for significant innovations. Semantic interoperability has been explored in depth over the past few years, especially with regard to the cultural heritage domain. Recently, many significant projects have focused on development of semantic approaches for data integration [Nam et al. 2002]. In addition, significant attention has been given to the creation of integrated repositories of scientific information [Rajasekar et al. 2002], and a data grid architecture equipped by a mediator that implements a "Global as View" approach for the integration of heterogeneous data sources has been proposed. Moreover, in Lehti [2004], an XML data integration approach is presented based on the Web Ontology Language (OWL). The proposed architecture maps XML structures (such as elements and attributes) to OWL structural components (such as classes, properties, etc.), and thus they convert the XML data to an OWL global ontology. In Cruz et al. [2004], the authors map the XML data of every local source to an RDF schema (RDFS) local ontology. They transform the XML elements and attributes to RDFS classes and properties, while in addition the local RDFS ontologies are merged to a global ontology for unified access and semantic integration of local data sources. Also in Hong et al. [2005], a system architecture providing integrated concept-based browsing, retrieval and analysis of museum information is discussed. The architecture consists of three main parts: 1) the Front layer provides an ontology browser; 2) the Mediation layer, which is also called the Semantic Web layer, represents a global conceptual schema of antique information and maintains a mapping between the global and local schemas; and 3) the Backend layer contains distributed Web Services and legacy databases of digital antiques. Furthermore, in Tous [2006] a model-mapping approach is applied to represent instances of XML and XML Schema in RDF. The described architecture proposes a semantic XPath processor that acts over an RDF-mapping of XML and is fed with an unlimited set of XML schemas and/or RDFS/OWL ontologies. In general, the approaches that deal with semantic integration are strongly oriented to integrate XML data to RDFS and OWL ontologies by focusing on structural mappings or model mappings between them (i.e., elements to classes, attributes to properties, etc.). However, in many of these cases the mapping effectiveness has not yet been examined, especially for really complex data structures like metadata schemas. The aggregation of cultural heritage domain information is often very difficult and demands high-level skills and scientific methods [Nam et al. 2002]. Increasing standardization or adoption of ad hoc standards, such as Dublin Core, as well as other metadata standards, achieve syntactic, structural, and limited semantic interoperability. In Amann et al. [2001] a mechanism for the integration of cultural information sources is proposed. The authors map pieces of information contained in XML fragments to domain specific ontologies, such as CIDOC/CRM, defining (1) a mapping language that describes the resources by a set of rules relating XPath location paths to the concepts and roles of an ontology, and (2) a query rewriting algorithm for translating user queries into queries expressed in an XML query language, which are then sent for evaluation to XML sources. Moreover Amann et al. [2002] extend their work presenting an ontology-based mediator architecture for integrating XML sources. The proposed architecture offers appropriate mechanisms for representing the semantics of the XML data by the classes and properties of the mediator's conceptual model. There are also many works on metadata mappings and crosswalks [Pierre 1998; Woodley 2000; Day 1996]. And there are some works referring to mapping data schemas to CIDOC/CRM that try to enable the exchange and sharing of heterogeneous sources, both within and between cultural institutions. One effort analyses how to combine MPEG-7 with CIDOC/CRM entities into a single ontology for describing and managing multimedia in museums [Hunter 2002]. In this case, CIDOC/CRM is extended by MPEG-7 components to add multimedia metadata capabilities. Moreover, an effort for the mapping of DC elements to CIDOC/CRM is in process [Doerr 2003; Kakali et. al. 2007]. To conclude, an integration mechanism is required to gather collection-level information from libraries, archives, museums and other cultural institutions without worrying about the differences in format, language, and structure. This mechanism should ensure information loss minimization and simultaneously reveal the data semantics. This article extends the work done for the mapping of Dublin Core property set to CIDOC/CRM [Kakali et. al. 2007)] and presents a crosswalk between DCCAP and CIDOC/CRM. In particular, CIDOC/CRM has been selected because it is a core ontology that is designed to be applied for the documentation, integration, mediation and exchange of heterogeneous cultural information. It is a conceptual model composed of entities that are organized into a hierarchy and that are semantically related to each other with properties. Specifically, CIDOC/CRM defines the complex interrelationships that exist between objects, actors, events, places and other concepts in the cultural heritage field. The value of CIDOC/CRM becomes apparent when it is used as the basis for data transfer and exchange between different systems, schemas and semantics [Crofts 2003]. On the other hand, DCCAP is chosen among a plethora of collection-level metadata schemas, since it is quite simple and well-defined and has been created by a widely known organization engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. The harmonization of these two schemas is expected to offer an integrated and semantically rich model for expressing complicated historical cultural data and would facilitate a conceptual unification of collection entities and object items as well. 3. Integration Scenario Ontology based MediatorAccording to Hong et al. [2005], "physically combining data into a single system may be impossible for technical, organizational or economic reasons. Thus mediation systems are used instead to federate information sources, which make distributed queries possible without physically aggregating information into a single monolithic database. A typical mediation system acts as a single interface for users. It accepts and interprets queries and distributes them to participating systems. These systems reply to the mediator, which then consolidates the results for the end user". Figure 1 presents an architecture presented in Stasinopoulou et al. [2007] in which a set of data sources exists, each of them possibly following a different metadata schema. All schemas are integrated through CIDOC/CRM, which acts as the global schema allowing the gathering of all the necessary cultural information in a suitable form for further reasoning. The demand for integrating collection-level metadata from different sources requires the mediator to realize the defined mappings from the various schemas to CIDOC/CRM ontology and vice versa. The architecture relies on the concept that it is better to have mappings from many metadata schemas to one core ontology than to have mappings between many schemas.
In the case where a user wants to discover cultural collections that have the keywords in title "folklore music", in terms of DCCAP, this query is translated that a user is seeking collection-level records for which DCCAP.Title="Folklore Music". Suppose that the user poses his (appropriately formed) query to a DCCAP local data source. The DCCAP records from the local source matching the query are returned to the user. The query is then propagated to the ontology mediator where it is transformed, using a set of mapping rules from DCCAP to CIDOC/CRM, into an equivalent query in terms of CIDOC/CRM ontology. This query could be translated to the CIDOC/CRM path: E78(Collection)- P102 has title(is title of)-E35 Title= "Folklore Music", denoting a sequence of entities and properties that express the title of a collection. The CIDOC/CRM query is then transformed to other schemas and propagated to corresponding sources. If they match, the results are returned to CIDOC/CRM based mediator and then to the user after being transformed back into DCCAP format. 4. Modeling Cultural Heritage Collections Using CIDOC/CRMCIDOC/CRM provides an extensible ontology for the definition of main concepts and their relationships in cultural heritage documentation. It is the international standard (ISO 21127:2006) for the controlled exchange of cultural heritage information between various memory institutions like archives, libraries, museums, and others. CIDOC/CRM covers the historical, geographical and theoretical context information that enhances museum collections with significance and value. The usage of CIDOC/CRM enhances accessibility to museum-related information and knowledge, and provides an important information standard and reference model for Semantic Web initiatives. CIDOC/CRM consists of 81 classes and 132 unique properties. Each class is identified by numbers preceded by the letter "E" (historically, classes were sometimes referred to as "Entities") and are named using noun phrases (nominal groups) using title case (initial capitals). For example, the class E63 has the title "Beginning of Existence". Each property correlates two classes, and therefore has a domain and a range class. In addition, each class is considered symmetric and is identified by numbers preceded by the letter "P" and is named using verbal phrases in lower case. Examining the CIDOC/CRM class and property hierarchies, it can be shown that the ontology has the ability to describe:
CIDOC/CRM supports rich semantic descriptions of concepts or events both independent real world events, as well as the events that occur in the life cycle of a resource and those events that are depicted in visual information objects. A collection in CIDOC/CRM is presented by the class "E78 Collection", which is a subclass of "E24 Physical Man-Made Thing" and "comprises aggregations of physical items that are assembled and maintained ("curated" and "preserved," in museological terminology) by one or more instances of "E39 Actor" over time for a specific purpose and audience, and according to a particular collection development plan". The properties for the entity "E78 Collection" are presented in the appendix at the end of this article. Summarizing a collection is characterized by the following concepts:
At this point it is important to mention that this work challenged CIDOC CRM Special Interest Group and provoked the addition of one more class and one property in CIDOC/CRM called "E87 Curation Activity" and "P147 curated (was curated by)" respectively. The class E87 "Curation Activity" concerns the actions related to the creation, development and management of a collection, while the property "P147 curated (was curated by)" has domain the class "E78 Collection" and range the class "E87 Curation Activity", correlating thus the collections with the activity for their curation. The first step of the proposed approach for the mapping between DCCAP and CIDOC/CRM is to model cultural heritage collections using CIDOC/CRM. For this purpose we define a CIDOC/CRM path as a chain of the form entity-property-entity (e-p-e), such that the entities associated by a property correspond to the property's domain and range [Kondylakis et al. 2006]. Given this definition, the required model is obtained by defining all the possible CIDOC/CRM paths in which the "E78 Collection" class participates either as a domain or range entity. For example, consider the CIDOC/CRM path E78 (Collection)-P102 has title (is title of)-E35(Title). This path denotes the name/title of the described Collection. The CIDOC/CRM paths related with "E78 Collection" class are presented in Table II.
Table II. CIDOC entities for E78 Collection 5. Mapping DCCAP and CIDOC/CRM5.1 The methodology of mappingThe process of mapping between two or more metadata schemas refers to "a table that maps the relationships and equivalencies of them", focusing mainly on a simple semantic correspondence between the elements (Dublin Core Metadata Initiative). Crosswalk development is the intellectual task that specifies mapping rules from each element in the source metadata schema to a semantically equivalent element subset in the target metadata schema [Pierre 1998]. The mapping of two schemas is defined as a sufficient specification to correlate each instance of the source schema with the instances of the target schema with the same meaning so that data or queries under one schema can be transformed into data or queries under the other. The mapping methodology consists of two main steps: (1) Each element of both schemas is interpreted as a sequence domain-property-range (d-p-r), called path (thus each schema is decomposed to a set of d-p-r paths; and (2) Each path of the source schema is mapped individually to a path of the target schema. Specifically in our case, the following mappings are required: (a) the mapping between the Source Domain and the Target Domain, (b) the mapping between the Source Range and the Target Range, (c) the proper Source Path, (d) the proper Target Path, (e) the mapping between the Source Path and Target Path and (f) in some cases, the combination of paths sharing the same instances. This article adopts the mentioned path-oriented methodology and maps CIDOC/CRM paths to DCCAP paths and vice versa. As mentioned previously, a CIDOC/CRM path is a chain of the form entity-property-entity (e-p-e), such that the entities associated by a property correspond to the property's domain and range. On the other side, a DCCAP path is defined by linking the DCCAP record with a sequence of DCCAP properties and sub-properties (refinements). For example, the path DCCAP -> cld:IsLocatedAt denotes that the location of the collection is an application profile property, associated with the Namespace "Collection Description Terms" (cld). An example of mapping the DCCAP property "Title" and "Title.Alternative" to CIDOC/CRM is given in Figure 2.
Another more complex example is given in Figure 3. Consider the mapping of the DCCAP path DCCAP-> dcterms:created (i.e., the mapping of the property "Dates Collection Accumulated") to CIDOC/CRM terms. The DCCAP record is mapped to the class "E78 Collection" and the property "Dates Collection Accumulated" could be mapped to the CIDOC/CRM entity "E52 Time Span". However, in CIDOC/CRM "E52 Time-Span" is used to define the temporal extent of instances of the classes "E4 Period", "E5 Event" and any other subclasses corresponding to phenomena valid for a certain time. Thus, the classes "E52 Time Span" and "E78 Collection" could be interlinked only with the intermediation of an event or an activity, defining that the event, which took place on a specific date, resulted in the described object. Thus, the classes "E52 Time Span" and "E78 Collection" could be interlinked only with the intermediation of an event or an activity, defining that the event, which took place on a specific date, resulted in the described object. As a result, the DCCAP path DCCAP-> dcterms:created corresponds to the CIDOC/CRM path E78(Collection)-P147(was curated by)-E87(Curation Activity)-P4(has time span)-E52(Time Span).
Finally, it is worth mentioning that a mapping annotation DTD has been defined in Kondylakis [2006]. Figure 4 provides the XML encoding of the mapping presented in Figure 2 by using the mentioned DTD. A source domain condition is specified that the mapping is for the DC.TYPE "Collection" and the target domain condition is specified that it is CIDOC/CRM entity "E78 Collection".
5.2 DCCAP and CIDOC/CRM crosswalkThe notation of Figures 2 and 3 could be used for the mapping of all DCCAP properties to CIDOC/CRM and vice versa, but to make this article more readable, the mapping is presented in Table III.
Table III. Mapping between two schemas Some points of the crosswalk that need to be clarified are the following:
5.3 Issues about the mappingNot all the CIDOC/CRM paths that model a collection could be mapped to DCCAP paths, since DCCAP does not provide relevant paths for some of them. This phenomenon is quite reasonable, since CIDOC/CRM provides rich semantics related to cultural objects and covers all the facets of the life of a physical object. Furthermore, CIDOC/CRM can describe the time of occurrence of every Activity or Event that takes place. Hence, it can express the dates of all the activities related to a collection's life time, while DCCAP cannot. Some indicative CIDOC/CPM paths for which there are no corresponding DCCAP paths include the following: E78(Collection)-P24 transferred title of (change ownership through)-E8 (Acquisition event)-P4 has time-span (is time-span of)-E52(Time-Span), which denotes the Acquisition date; E78(Collection)-P25 moved (moved by)-E9(Move)-P4 has time-span (is time-span of)-E52(Time-Span), which denotes the Move date; E78(Collection)-P13 destroyed (was destroyed by)-E6(Destruction)-P4 has time-span (is time-span of)-E52(Time-Span), that denotes the Destruction date; and the path E78(Collection)-P30 transferred custody of (custody transferred through)-E10(Transfer of Custody)-P4 has time-span (is time-span of)-E52(Time-Span), which declares the date when the Transfer of Custody occurred. In the same context, there are no elements in DCCAP for denoting the persons involved in the above mentioned events. Thus the persons responsible for a Move event or a Destruction event, who are described correspondingly by the paths: E78(Collection)-P25 moved (moved by)-E9(Move)-P14 carried out by (performed)-E39(Actor) and E78(Collection)-P13 destroyed (was destroyed by)-E6(Destruction)-P14 carried out by (performed)-E39(Actor), do not have equivalent DCCAP paths. A significant issue is that there is no DCCAP element for indicating the transfer of custody of a collection, as well as the former or current keeper of the collection who is responsible for the transfer of custody. The following CIDOC paths express this information: E78(Collection)-P49 has former or current keeper (is former or current keeper of)-E39(Actor) or the pair of the paths E78(Collection)-P30 transferred custody of (custody transferred through)-E10(Transfer of Custody)-P28 custody surrendered by (surrendered custody through)-E39(Actor) and E78(Collection)-P49 has former or current keeper (is former or current keeper of)-E39(Actor) or E78(Collection)-P30 transferred custody of (custody transferred through)-E10(Transfer of Custody)-P29 custody received by (received custody through)-E39(Actor). Finally, it must be emphasized that DCCAP does not provide any element concerning the removal of an item or a part from the collection, although it provides elements for material additions to the collection (accrual method, etc.). In other words there are no corresponding paths for any CIDOC/CRM paths starting with E78(Collection)-P112 diminished (was diminished by)- E80(Part Removal). On the other hand, the path DCCAP->c:language that denotes the language of the items in the collection, does not have an equivalent CIDOC/CRM path. In CIDOC/CRM ontology the concept of the collection language is valid only in case the collection contains texts or documents. The path E56(Language)- P72(has language)- E33(Linguistic Object)- P70 is documented in- E78(Collection) is applied only for text collections. Another important issue is the validation of the presented mapping. Such a process is quite difficult to define and perform, but there are two main preferred criteria that specify it:
The proposed mappings in table III are symmetric, and for each source path there is only one corresponding target path. Application Profiles can be prone to different local interpretations, in which case a specific mapping may need modification, and may, indeed, better contain polysemy. 6. ConclusionSince metadata semantic interoperability in the cultural heritage domain is a crucial issue due to the large number of digital collections, a semantic integration mechanism that provides unified access to cultural heritage information and focuses on collection-level records is needed. In this context, a reference to a variety of metadata schemas for collection-level description is presented in this article, and an ontology-based integration architecture is proposed. Further, as a part of the integration scenario, a path-oriented crosswalk from DCCAP to CIDOC/CRM ontology and vice versa is defined. An important prerequisite for this mapping is to model the life cycle of cultural heritage collections as CIDOC/CRM paths. The proposed model reveals explicitly all the concepts and activities related to cultural heritage collections and constitutes the basis for the proposed crosswalk. The mapping presented here is adequately complex, given that CIDOC/CRM adopts an event-based approach, which means that it connects the description of cultural heritage objects with the events that happen to them. The event-based character of CIDOC/CRM makes it necessary to use the intermediate CIDOC/CRM activity and event entities to represent the relationships expressed in DCCAP between objects and persons. In general, the proposed methodology explicitly reveals the existing semantic correlations between the CIDOC/CRM and DCCAP, and consequently it can be stated that the former is able to integrate a variety of cultural heritage application profiles. A significant conclusion is that ontologies provide rich semantics and therefore should be preferred as mediating schemas among other application profiles. Future work will deal with mappings from other collection-level metadata schemas to CIDOC/CRM. Moreover, we plan to test the mediator by feeding several collection descriptions into a testbed and then evaluating retrieval results from it. ReferencesAllen, S. M. 2002. Nobody knows You're a dog (or library, or museum, or archive) on the internet: The convergence of three cultures. Proceedings of the 68th IFLA General Conference and Council, Glasgow - Scotland. Amann, B., I. Fundulaki, M. Scholl, C. Beeri, and A-m. Vercoustre. 2001. Mapping XML fragments to community web ontologies. Proceedings Fourth International Workshop on the Web and Databases (WebDB'2001). Amann, B., C. Beeri, I. Fundulaki, and M. Scholl. 2002 Ontology-Based Integration of XML Web Resources Source, Proceedings of the First International Semantic Web Conference on The Semantic Web, Sardinia, Italy, June 2002, Lecture Notes in Computer Science, Eds Springer, 117-131. <doi:10.1007/b100228>. Buckland, M. 1992. Redesigning library services: A manifesto. Chicago: American Library Association, ISBN-0-8389-0590-0. Crofts, N., M. Doerr and T. Gill. 2003. The CIDOC conceptual reference model: A standard for communicating cultural contents. Cultivate Interactive, 9. Available at <http://www.cultivate-int.org/issue9/chios/>. Cruz, I. F., X. Huiyong, and H. Feihong. 2004. An ontology-based framework for XML semantic integration. Proceedings of the 8th International Database Engineering and Applications Symposium, Coimbra, Portugal, Coimbra, Portugal. IEEE Computer Society, 217 - 226. <doi:10.1109/IDEAS.2004.1319794>. Cruz, I. F. and H. Xiao. 2005. The role of ontologies in data integration. Engineering Intelligent Systems, 13(4): 245-52. Available at <http://www.cs.uic.edu/~advis/publications/dataint/eis05j.pdf>. Day, M. 1996. Metadata: Mapping between metadata formats. in UKOLN: The UK Office for Library and Information Networking. Available from <http://www.ukoln.ac.uk/metadata/interoperability>. Dawson, F. and T. Howes, 1998. RFC 2426 - vCard MIME Directory Profile, <http://www.imc.org/rfc2426>. Defense Advanced Research Projects Agency (DARPA). 1990. "Knowledge Sharing Effort (KSE)", Defense Advanced Research Projects Agency, <http://www-ksl.stanford.edu/knowledge-sharing/>. Doerr, M. 2003. The CIDOC conceptual reference module an ontological approach to semantic interoperability of metadata. AI Magazine, 24(3): 75-92. Available at <https://www.aaai.org/ojs/index.php/aimagazine/article/view/1720/1618>. Dublin Core Metadata Initiative, "Dublin Core Metadata Glossary", Dublin Core Metadata Initiative, <http://library.csun.edu/mwoodley/dublincoreglossary.html>. Dublin Core Collection Description Task Group, "Dublin Core Collections Application Profile", Dublin Core Collection Description Task Group, <http://www.dublincore.org/groups/collections/collection-application-profile/>. Dublin Core Metadata Initiative, "Dublin Core Metadata Element Set", Version 1.1, Dublin Core Metadata Initiative, <http://purl.org/dc/documents/rec-dces-19990702.htm>. Foulonneau, M., T. Cole, T. Habing, and S. Shreeves. 2005. Using collection descriptions to enhance an aggregation of harvested item-level metadata. Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, Denver, Colorado, USA. <doi:10.1145/1065385.1065393>. Gill, T. 2004. Building semantic bridges between museums, libraries and archives: The CIDOC conceptual reference model. First Monday, 9(5). Available at <http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1145/1065>. Hakala, H. 2005. Collection mapping: Models and standards for international cooperation. Proceedings of the International Seminar on Collection Mapping, Helsinki, Finland. Available at <http://www.varastokirjasto.fi/tietokartta/hakala.pdf>. Heaney, M. 2000. An analytical model of collections and their catalogues, UKOLN: The UK Office for Library and Information Networking. Available at <http://www.ukoln.ac.uk/metadata/rslp/model/amcc-v31.pdf>. Hong, B., L. Hongzhe, Y. Jiehua, and X. Hongwei. 2005. An ontology-based semantic integration for digital museums. Proceedings of the 6th International Conference on Web-Age Information Management (WAIM 2005), Hangzhou, China. October 2005, Eds. Springer, 626-631. <doi:10.1007/11563952>. Hunter J. 2002, Combining the CIDOC CRM and MPEG-7 to Describe Multimedia in Museums, Proceedings of the Museums and the Web international conference, Boston, April 2002. ICOM/CIDOC Documentation Standards Group and CIDOC CRM Special Interest Group, "Definition of the CIDOC Conceptual Reference Model". Available at <http://cidoc.ics.forth.gr/definition_cidoc.html>. Kakali, C., I. Lourdi, T. Stasinopoulou, L. Bountouri, C. Papatheodorou, M. Doerr, and M. Gergatsoulis. 2007. Integrating Dublin Core metadata for cultural heritage collections using ontologies. Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2007), Singapore. Available at <http://dcpapers.dublincore.org/ojs/pubs/article/view/871/867>. Kondylakis, H., D. Martin, and D. Plexousakis. 2006. Mapping language for information integration. Greece: Institute of Computer Science, Foundation of Research and Technology, 385. Available at <http://www.ics.forth.gr/isl/publications/paperlink/Mapping_TR385_December06.pdf>. Lagoze, C. and D. Fielding. 1998. Defining collections in distributed digital libraries. D-Lib Magazine (November 1998). doi:10.1045/november98-lagoze. Lagoze, C., W. Arms, S. Gan, D. Hillmann, C. Ingram, D. Krafft, R. Marisa et al. 2002. Core services in the architecture of the national science digital library (NSDL). Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, Portland, Oregon, USA, ACM, 201 - 209. <doi:10.1145/544220.544264>. Lee, H. -L. 2000. What is a collection? Journal of the American Society for Information Science. 51(12). <doi:10.1002/1097-4571(2000)9999:9999<::AID-ASI1018>3.0.CO;2-T>. Lehti, P. and P. Bankhauser. 2004. XML data integration with OWL: Experiences & challenges. Paper presented at The Symposium on Applications and the Internet (SAINT'04), Tokyo, Japan. <doi:10.1109/SAINT.2004.1266111>. Library of Congress, "Encoded Archival Description", Library of Congress, <http://www.loc.gov/ead/>. Lively, L. 1996. Managing information overload. New York: AMACOM. MIT Libraries, "Metadata Reference Guide", MIT Libraries, <http://libraries.mit.edu/guides/subjects/metadata/mappings.html>. Nam, Y-K., J. Coguen, and G. Wang. 2002. On the Move to Meaningful Internet Systems, Proceedings of the Confederated International Conferences DOA, CoopIS and ODBASE 2002, Springer Berlin / Heidelberg, Lecture Notes on Computer Science, 2519, 1332-1344. <doi:10.1007/3-540-36124-3_84>. National Information Standards Organization, "ANSI/NISO Z39.91 - 200x Collection Description Specification DSFTU", National Information Standards Organization, <http://www.niso.org/topics/ccm/ccmstandards/>. Pierre, St M. and W. LaPlant, Jr. 1998. Issues in crosswalking content metadata standards. In NISO press. Available at <http://www.niso.org/publications/white_papers/crosswalk/>. Powell, A. RSLP collection description schema: Collection description schema. Available from <http://www.ukoln.ac.uk/metadata/rslp/schema/>. Rajasekar, A., R., R. Moore, B. Ludascher, and I. Zaslavsky. 2002. The GRID adventures: SDSC's storage resource broker and web services in digital library applications. Proceedings of the Fourth All-Russian Scientific Conference RCDL'2002 Digital Libraries: Advanced Methods and Technologies, Digital Collections, Dubna, Russia. Vol 1/2. Renear, A.H., K. Wickett, R. Urban, D. Dubin, and S. Shreeves. 2008. Collection/Item metadata relationships. Proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, Germany, September 2008. Available at <http://dc2008.de/wp-content/uploads/2008/09/dcmi08renearfinal.pdf>. Stasinopoulou, T., L. Bountouri, C. Kakali, I. Lourdi, C. Papatheodorou, M. Doerr, and M. Gergatsoulis. 2007. Ontology-based metadata integration in the cultural heritage domain. Proceedings of the 10th International Conference on Asian Digital Libraries, Hanoi, Vietnam. Springer-Verlag, 165-175. <doi:10.1007/978-3-540-77094-7_25>. Tous, R., and J. Delgado. 2006. Contorsion: A semantic XPath processor. Paper presented at the International Workshop on Database Interoperability (InterDB 2005), Namur, Belgium, Electronic Notes in Theoretical Computer Science, 150(2) (3/23): 87-102. <doi:10.1016/j.entcs.2005.11.036>. Wilson, P. 1996. Interdisciplinary research and information overload. Library Trends, 45(2): 192-203. Available from <https://www.ideals.uiuc.edu/handle/2142/8082>. Woodley, M. 2000. Crosswalks: The path to universal access In Introduction to metadata: Pathways to digital information. Getty Information Institute. Appendix
Copyright © 2009 Irene Lourdi, Christos Papatheodorou, and Martin Doerr |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Top | Contents | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
D-Lib Magazine Access Terms and Conditions doi:10.1045/july2009-papatheodorou
|