|
D-Lib Magazine
January/February 2015
Volume 21, Number 1/2
Table of Contents
A Framework Supporting the Shift from Traditional Digital Publications to Enhanced Publications
Alessia Bardi and Paolo Manghi
Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy
{alessia.bardi, paolo.manghi}@isti.cnr.it
DOI: 10.1045/january2015-bardi
Printer-friendly Version
Abstract
Enhanced publications (EPs) can be generally conceived as digital publications "enriched with" or "linking to" related research results, such as research data, workflows, software, and possibly connections among them. Enhanced Publication Information Systems (EPISs) are information systems devised for the management of EPs in specific application domains. Currently, no framework supporting the realization of EPISs is known, and EPIs are typically realized "from scratch" by integrating general-purpose technologies (e.g. relational databases, file stores, triple stores) and Digital Library oriented software (e.g. repositories, cataloguing systems). Such an approach is doomed to entail non-negligible realization and maintenance costs that could be decreased by adopting a more systemic approach. The framework proposed in this work addresses this task by providing EPIS developers with EP management tools that facilitate their efforts by hiding the complexity of the underlying technologies.
1 Introduction
The traditional scholarly communication model is based on the pair document metadata, where a document is generally a digital scientific article and the relative metadata is provided as a set of structured information supporting interpretation and discovery of the article. It is becoming today evident ([9-11]) how such a model cannot cope with the requirements of modern scholarly communication, which adds to the research chain different kinds of outputs (e.g. software, datasets, workflows) and aims at different usages of such outputs (e.g. research impact measurement, repetition of experiments).
In many disciplines the publication is considered just the tip of the iceberg: the publication describes a research investigation, but it does not provide sufficient information to validate its results. In order for an experiment to be replicable and reproducible, data and processes should be shared as well. In other words, data and processing workflows should be considered "first class citizens", together with publications, of the scholarly communication chain [6, 9, 10, 12, 13, 14, 15, 16, 20, 21]. As a matter of fact, scholarly communication tools should support researchers at sharing any form of research outputs that is relevant to the correct interpretation, re-use (repetition, reproduction), and assessment of research activities [11].
As shown by the survey conducted in [2], Enhanced Publications (EPs) can be generally conceived as digital publications "enriched with" or "linking to" related research results, such as research data, workflows, software, and possibly connections among them. Enhanced Publication Information Systems (EPISs) are systems devised for the management of EPs [3-8, 19]. The majority of those systems are tailored to their specific communities and realized "from scratch" so that functionalities that are shared across disciplines and user communities are re-implemented every time. In fact, EP-oriented software is realized by integrating technologies that are general-purpose (e.g. databases, file stores) and Digital Library-oriented (e.g. repository software, cataloguing systems). The resulting products are often not flexible enough to be adapted to the evolving requirements of the community they target and hardly re-usable and configurable to be re-used in different application domains with similar requirements.
Such a "from scratch" approach entails non-negligible realization and maintenance costs that could be decreased by adopting a more systemic approach, as it had been done in the past with Database Management Systems (DBMSs). Indeed, the success of DBMSs is due to their capability of delivering functionalities that were previously implemented every time by applications dealing with data management issues (e.g. structure, storage, query, optimization, static typing, integrity, validation, redundancy, etc.). The adoption of a DBMS allows database developers to focus their efforts on domain-dependent requirements, rather than on implementing yet another data storage and/or retrieval layer [17][18].
A similar systemic approach has not yet been applied to the EPISs scenario, where ''from scratch" realization is the norm. In this paper we propose the realization of an Enhanced Publication Management System (EPMS) that is a software framework that plays the role of DBMSs in the world of EPs. The framework supports developers of EPISs with tools that (i) hide the complexity of the implementation of domain-independent requirements, (ii) allow the definition of personalized EP data models, (iii) support the realization and configuration of functionalities based on the defined EP data model.
The paper is organized as follows: Section 2 introduces a meta-model for Enhanced Publications, in principle mirroring what the relational data model is for RDMBSs; Section 3 introduces the general requirements of Enhanced Publication Management Systems; Section 4 drafts the architecture of our framework, addressing the requirements defined in Section 3 over the meta-model defined in Section 2.
2 Data models for enhanced publications
In the following we shall refer to the following definition of enhanced publication presented in [2] (in turn a refinement of the definition given by SURF Foundation in [1]:
[Enhanced publications are] digital objects characterized by an identifier (possibly a persistent identifier) and by descriptive metadata information. The constituent components of an EP include one mandatory textual narration part (the description of the research) and a set of interconnected sub-parts. Parts may have or not have an identifier and relative metadata descriptions and are connected by semantic relationships.
The survey conducted in [2] examined several EPISs in order to provide (i) a classification of such systems in terms of the functionalities they offer and (ii) highlight the common features of existing EP data models. The study identified four main classes of EPISs functionalities:
- Packaging of related research assets;
- Web 2.0 reading capabilities;
- Interlinking research outputs;
- Re-production and assessment of scientific experiments.
Moreover, by analyzing the data models adopted by such systems, the investigation characterized EP data models in terms of the following types of enhanced publication parts:
- Embedded parts: identity-less parts that are packaged within the enhanced publication. Embedded parts are not directly discoverable, i.e. it is possible to access them only once the embedding resource has been discovered. Typical examples of embedded parts are the supplementary material of scientific articles managed by journals: you cannot search for supplementary material, but once you have found the relative article (i.e. its embedding part), then you can access it.
- Structured-text parts: (semi-) structured textual documents, such as XMLs.
- Reference parts: remote resources referenced via resolvable identifiers such as URLs or a PIDs (e.g. handle, doi).
- Executable parts: parts that can be executed, such as workflows, web service instances, software code.
- Generated parts: parts that are generated from other parts. Examples are dynamic tables that are updated when the underlying data set changes; or a molecule 3D rendering generated by running a 3D molecule viewer with appropriate parameters.
Based on the above features, we have defined the EP meta-model in Figure 1. The EP meta-model provides the primitives for the definition of custom EP data models (in principle, all EP data models studied in [2] can be expressed as instances of this meta-model), and will inspire the construction of an EP data model definition language of our framework. To draw a parallel with relational databases, the EP meta-model corresponds to the relational model, while the EP data model language corresponds to the SQL definition language.
|