D-Lib Magazine
April 1999
Volume 5 Issue 4
ISSN 1082-9873
Reference Linking in a Hybrid Library Environment
Part 2: SFX, a Generic Linking Solution
Herbert Van de Sompel
herbert.vandesompel@rug.ac.be
Patrick Hochstenbach
patrick.hochstenbach@rug.ac.be
Automation Department of the Central Library of the University of Ghent, Belgium
Abstract
This is the second part of two articles about reference linking in hybrid digital libraries. The first part, Frameworks for Linking described the current state-of-the-art and contrasted various approaches to the problem. It identified static and dynamic linking solutions, as well as open and closed linking frameworks. It also included an extensive bibliography. The second part describes our work at the University of Ghent to address these issues. SFX is a generic linking system that we have developed for our own needs, but its underlying concepts can be applied in a wide range of digital libraries.
SFX linking
This is a description of the approach to the creation of extended services in a hybrid library environment that has been taken by the Library Automation team at the University of Ghent. The ongoing research has been grouped under the working title Special Effects (SFX). In order to explain the SFX-concepts in a comprehensive way, the discussion will start with a brief description of pre-SFX experiments. Thereafter, the basics of the SFX-approach are explained briefly, in combination with concrete implementation choices taken for the Elektron SFX-linking experiment. Elektron was the name of a modest digital library collaboration between the Universities of Ghent, Louvain and Antwerp.
The SFX working environment
The University of Ghent subscribes to a wide variety of electronic information services. They include SilverPlatters Electronic Reference Library solution (ERL) and ExLibriss Aleph 500 Integrated Library System both of which are important local building blocks. The ERL server hosts a wide variety of mainly secondary data (70+ Gb), while the Aleph system hosts the local catalog (500,000+ bibliographic records). Recently, ISIs Web of Science has been added. The environment also provides access to a collection of about 300 e-editions of scientific journals that are available without additional charge as part of the institutional paper-based subscription. Amongst those, the Springer, Wiley, HighWire, Institute of Physics and American Physical Society collections are the most noteworthy. For the Elektron SFX-experiment described below, temporary access to the Academic Press, the UMI Business Periodicals Online and the Blackwell Science collections was granted.
The environment is presented to end-users via a web-based menu-system called the Executive Lounge, which is an easy-to-use interface to the database of databases (Figure 1). The Executive Lounge menu items point at both the traditional library related sources (typically networked databases and full-text collections) and a limited number of websites with academic relevance. Upon a user's request, menu items can be presented in different views: by data-type (secondary sources, catalogs, primary sources); by discipline (humanities, medicine, engineering, etc.); via a menu item search screen; via a display presenting only menu items that can be searched simultaneously. For instance, in the data-type view, the menu-header "secondary sources" gives access to Current Contents as well as to the major Internet search engines. A reference to most of the e-Lib subject-based gateways will be found under the same header. The menu-header "catalogues" points at several important Belgian library catalogs, as well as at a catalog of electronic journals and important Internet bookstores. The menu-header "primary sources" points at established publishers e-editions as well as at a selection of free Internet e-journals.
Figure 1: the Executive Lounge interface
Pre SFX-linking experience
The Ghent library automation group has been actively involved in reference linking for several years:
- Several early papers identified the need to integrate a variety of electronic library resources in general (Van de Sompel 1991) and to link between secondary data, catalogs and primary data in particular (Van de Sompel 1993; Van de Sompel 1994).
- A link-to-holdings between the SilverPlatter ERL and the Aleph 500 system was created as soon as the Aleph system went into production [June 1997]. The implementation of this link was a basic requirement, expressed explicitly in the tender for a new library system [1995]. The decision to acquire a new library system was strongly inspired by the desire for integration. Eventually, the link-to-holdings implementation led to the general availability of a link-to-holdings feature in SilverPlatters WebSPIRS release 4 and in the Aleph 500 system [1998].
- A link between the Inspec database on SilverPlatters ERL and the IEEE electronic library collection has been implemented on behalf of the IMEC engineering research institute. This was a joint effort of the Belgian SilverPlatter distributor IVS, IMEC, and the Ghent library-automation group [fourth quarter 1997].
- Experiments have been conducted linking from SilverPlatters ERL databases to the full-text collection available via SwetsNet [mid 1997]. This led both to the availability of a general link-to-syntax for SwetsNet and the inclusion of SwetsNet in SilverPlatters SilverLinker solution [end 1997] (Hamilton 1998).
- Experiments have been conducted to link from SilverPlatters ABI/Inform to UMIs Business Periodicals Online collection hosted on the ProQuest Direct service [fourth quarter 1998]. These experiments have been facilitated by the availability of the UMI SiteBuilder link-to-syntax.
SFX-concepts
The aim of SFX is to provide extended services in the hybrid library environment. The goal is to present information to the user in the context of the entire collection that is available. As discussed in Part 1 of this article, the target(s) of a link is/are seen as a combination of the information providers and the libraries intentions and does not rely on a static database of links that are computed in advance.
An overview of the design is shown in Figure 2 and is expanded in the following sections.
Figure 2: the SFX mechanism
SFX, linking from to
A reference link is from one item of information to another. In the following, the term "link-source" will refer to the information unit for which links need to be provided. Link-sources can be records from OPAC systems, from abstracting and indexing databases (A&I), the bibliographic information of a full-text paper as well as each of its citations.
So far, SFX-experiments have concentrated on the link-sources that are shown in Table 1. This set of link-sources has been chosen for research because it contains link-types that have hardly been investigated, but also because it restricts the problem to systems where the link-sources are under local control. Although this choice might seem to limit the scope of the research, it has allowed the development of solutions to grab link-sources other than proxying, and to concentrate on other aspects of linking that are equally important.
SOURCE
secondary database
yes
yes
yes
yes
OPAC
yes
yes
yes
yes
primary collection
no
no
no
no
other web info
no
no
no
no
secondary
database
OPAC
primary
collection
other
web info
TARGET
Table 1: SFX linking from-to
The Colli: a collection of anticipated conceptual links
Static linking solutions are not considered in SFX-linking. Therefore, there is no database containing hardwired links between the data that is involved. Instead, there is a collection of anticipated conceptual links (Colli) that the hybrid library wants to make available to its users. The organization of the Colli is based on the feasibility of actually creating the link at some further stage in the process (i.e. existence of a link-to-service) and in anticipation of users' expectations. Each of the conceptual links is introduced in order to provide a certain service that is thought to be valuable for users of the system.
Each of the anticipated links in the Colli is accorded a name that corresponds to a procedure designed to resolve the link-to-syntax using parameters extracted from the link-source. The links that have been introduced for the Elektron experiment, are shown in Table 2.
There are 3 links to OPAC systems that are important for interlibrary loan: the Ghent Aleph 500 system, the Belgian union catalogue of serials, and the Dobis/Libis system at the University of Louvain. There are links to secondary databases, such as L-BIP, which is intended to look for the record in Books in Print that corresponds to the link-source. L-ULRICH is similar, with links into Ulrichs Serials Directory. L-JCR looks up Journal Citation Reports data for the link-source; thus it provides the user with ISIs notion of the quality of the referenced journal. L-CC is intended to bring up the table of contents (including abstracts) from the Current Contents database, for the issue of the journal that is referred to by the link-source. There are several links to primary information collections, whose names are self-explanatory. Finally, the L-AMAZON link leaves the typical academic information environment, and searches for the book referred to by the link-source, in order to present the user with book reviews and ordering information.
All conceptual links and related procedures are seen as being independent of:
- The nature of the link-source: Abstract and Indexing record, OPAC record, citation in full-text paper, bibliographic data of a full-text document.
- The location of the system where the link-source originates from: local, remote.
the COLLI
type
link name
links to
to OPAC systems L-ALEPH
University of Ghent OPAC
L-ANTILOPE
Belgian union catalogue of serials
L-LIBIS
University of Louvain OPAC
to secondary databases L-BIP
Books in Print
L-ULRICH
Ulrichs International Periodicals Directory
L-JCR
ISIs Journal Citation Reports
L-CC
ISIs Current Contents
to primary information L-SWETS
SwetsNet collection
L-SPRINGER
Springer full-text collection
L-ACADEMIC
Academic Press full-text collection
L-BPO
UMI Business Periodicals Online collection
to others L-AMAZON
Amazon.com online bookstore
Table 2: the Colli in the Elektron SFX-experiment
The SFX-button: just-in-time linking
SFX takes a "just-in-time" instead of a "just-in-case" approach to linking. When information is presented to the user, potential link-sources are marked with an SFX button. As a means of reducing delays, no links are computed until requested by the user. For each link-source, an identifier is hidden behind an SFX-button (see I in Figure 2). This identifier holds the following information:
- ID of the server from which the link-source originates
- database ID of the database where the link-source originates
- unique record ID of the link-source within that database
- the SFX server process that is executed by clicking this button
A user must explicitly request links for a link-source by clicking the SFX-button. Clicking transfers the identifier to the local target that uses it to pull the link-source into its environment (see II in Figure 2). The ID of the server not only gives information on its location, but also on the protocol to be used to grab the link-source. In the case of OPAC or Abstracting and Indexing databases, this might be Z39.50. But it might also be a Lightweight Directory Assistance Protocol (LDAP) look up, a Handle resolution, or an http link. Next, the document is parsed into a generic format and essential parameters are extracted (see III in Figure 2). All information is kept at the server-side, in relation to the users session-ID. The system is now ready to start the next phase in the process: the conceptual verification of potential links from the Colli.
The "just-in-time" approach, requiring an explicit user action to request links, seems to be justified by the following:
- The link-to-holdings feature connecting Abstracting and Indexing databases with the local OPAC in the Ghent environment also requires an explicit user action. Logs show that the holdings button is being used for approximately 3.3 % of the records that are being transferred, meaning that the link remains idle for 96.7 % of the records. Similar statistics can be expected in the broader context of SFX-linking, if only because end-user searches in large databases are typically done with a low accuracy (Bates 1998) and because links for records that look irrelevant to a user are not likely to be followed. As such, "just-in-time" linking can dramatically reduce response times by only going through the required overhead, when necessary.
- Since SFX intends to serve a bundle of links to the user for each link-source, a "just-in-case" approach, instantly feeding all links for each link-source, would inevitably lead to user interface problems.
- The explicit user action identifies records that the user considers relevant. The accumulation of such information -- in combination with search strategies -- can, in the long-term, lead to a database that supports a recommendation system.
SFX-linking approaches the problem of grabbing the link-source by introducing a clickable identifier, containing a small data record, for each link-source. The technique is identical for all systems involved. It is recognized that the implementation of this solution was simplified by the fact that the originating servers used in the experiments were under local control. Both providers of the local systems in Ghent -- ExLibris and SilverPlatter -- have enabled its straightforward realization. Still, the concept is quite generic, and could also be implemented with systems under remote control, to create a general purpose "just-in-time" linking solution.
For instance, in the case of the Open Journal Project, journal papers are proxied and parsed before delivery to the user. There, many of the complexities involved with linking from citations could be postponed to a later phase in the process, by initially only identifying link-sources in the HTML or PDF documents and inserting, respectively, SFX-anchors or SFX-named-destinations as unique link-source IDs. Storing the enhanced document in the server environment and simultaneously sending it to the user would create a set-up in which the link-source could be retrieved and processed only upon the users request.
Proxying should be considered to be the hard way to grab the link-source. It is obvious that cooperation of the authority can lead to more straightforward solutions to grab the link source. One can imagine a situation where the authority inserts the required identifier along with the appropriate address of an institutional SFX-server on a subscription basis. Although this might sound like wishful thinking, such a possibility is almost inherent to the DOI concept, on the condition that:
- link-sources are delivered to users with inclusion of their own DOI,
- resolution of such a DOI can be redirected to a local target.
Under these conditions, an institutional SFX-server could retrieve the link-source from a DOI directory.
Conceptual verification of links from the Colli via the SFX-base
Since there are no "a priori" computed links in this environment, there is no initial certainty on the relevance of a specific conceptual link from the Colli for a specific link-source. Meanwhile, that link-source resides in a parsed format in the servers environment (see III in Figure 2). In order to prevent irrelevant links from being presented to the user, the SFX-base is introduced (see IV in Figure 2). The SFX-base describes the relationship between the conceptual links from the Colli and the parameter values of link-sources for which the conceptual links are valid. Matching parameters of a link-source with the SFX-base filters out irrelevant links. The matching process fulfills a conceptual verification for each of the links from the Colli. Once a link has been selected in this process, it will be included in the bundle of links that will be presented to the user (see V in Figure 2).
It should be emphasized that this selection does not guarantee the success involved in following the link, at a later stage. The conceptual verification minimizes the amount of predictable failures. For instance, when the active document refers to a journal article, the anticipated link to Amazon.com will be filtered out. When the active document originates from the Current Contents database or when the journal referred to by the active document is not indexed in Current Contents, the L-CC link will receive a negative flag. A link to Springer will only be selected when the active document refers to a paper published in a Springer journal with a publication year that makes electronic availability near to certain.
A limited number of parameters have been defined for the SFX-base of the Elektron experiment:
- Material type of the document described by the link-source: restricted to books and serials at present.
- ISSN number of the document described by the link-source.
- Threshold for the publication year of the document described by the link-source, beyond which a certain link becomes relevant.
- ID of the database where the link-source originates: in the actual implementation these are the names of databases installed on local systems. Extension of the service to link-sources from primary collections, would introduce additional IDs, probably referring to publishers collections and/or ISSN numbers.
This information is brought together in a relational database, with the Colli as a central table (Figure 3). In addition to the described parameters, a link-type table is added to the SFX-base, which allows the presentation of the relevant links in a structured way, corresponding to the classification made in Table 2, reflecting the organization of the database of databases in the Executive Lounge menu-system (Figure 1).
Figure 3: the SFX-base
A simplified overview of the contents of the SFX-base used in the Elektron experiment is given in Table 3. It is obvious that the design of the SFX-base requires fine-tuning in order to become a production system, but for an experimental set-up, a certain roughness has been tolerated:
- The "threshold year" parameter gives cause for some degree of uncertainty.
- The threshold values for L-BIP and L-AMAZON are quite arbitrary.
- The one for L-CC is more exact, since it reflects the starting year of the Current Contents database that is available for look up. Still, the 1996 issues of Current Contents can contain records referring to papers published in 1995. Using the proposed threshold filters out the L-CC link for those records. Bringing the threshold down to 1995 would conceptually select all records with a publication year starting in 1995, the majority of which would not be covered by the available collection of Current Contents.
- The threshold values for the full-text links refer to the earliest publication year for which the linked publisher has online content available. However, in many cases the electronic starting point varies for different journals of the same publisher. This calls for the introduction of a new table connected to the ISSN table, containing information on publication year, volume, and issue of the first electronic edition.
- For now, the SFX-base has only been fed with information on databases and journals with current subscriptions. In order to come to a more generic design that would be applicable in a consortium environment, there is definitely a need to include subscription information in the design, both in connection with the database-ID and the ISSN table. This might call for integration with the serials module of the Integrated Library Systems that are involved.
- Since the scope of databases changes over time, limiting links into secondary databases to link-sources that have ISSN numbers of journals indexed in those databases, without involving a time-concept, is not fully waterproof.
link name
material type
threshold year
source
dbase id
ISSN
L-ALEPH all
all
all except ALEPH
all
L-ANTILOPE serials
all
all except ANTILOPE
all
L-LIBIS all
all
all except LIBIS
all
L-BIP books
> 1970
all except BIP
none
L-ULRICH serials
all
all except ULRICH
all
L-JCR serials
all
all
Only journals evaluated in JCR L-CC serials
>= 1996
all except CC
only journals abstracted in CC L-SWETS serials
>= 1997
all
ISSN numbers of Blackwell journals L-SPRINGER serials
>= 1997
all
ISSN numbers of Springer journals L-ACADEMIC serials
>= 1996
all
ISSN numbers of Academic Press journals L-BPO serials
>= 1997
all
ISSN numbers of UMI BPO journals L-AMAZON books
> 1970
all
none
Table 3: content of the SFX-base
The SFX-screen: a bundle of unresolved, functionally unverified links
The result of the conceptual verification process is a buffer of potential link-names, corresponding to links from the Colli that are relevant for the current link-source. The link-names in the buffer are organized in accordance with the classification shown in Table 2, and delivered to the user in a separate browser window (see VI in Figure 2 ; see Figure 5 , Figure 7 and Figure 9). Following the same argument that led to justification of just-in-time linking, at this stage links are still not resolved. The potential links are sent to the user, with the link procedure names as parameters. A server based link-resolution process that will be activated when the user chooses to follow a certain link will use these parameters. At that point, the essential information from the actual document is retrieved from the copy of the link-source that is held at the servers side. Next, this information is fed to the chosen procedure in order to resolve the link (see VII in Figure 2). Finally, the user is redirected to the appropriate location (see VIII in Figure 2 ; Figure 6, Figure 8 and Figure 10).
Figure 4: an OPAC serials record
Figure 5: the SFX screen for the OPAC serials record
As a consequence of this approach, the links in the SFX-screen are not functionally verified, and following them may lead to empty results. This design option is subject to some considerations:
- Many of the links presented in the SFX-screen should be interpreted as alternative search features, rather than foolproof links. Hence, using them is subject to all the characteristics of searching, including empty result sets, abundant result sets, serendipity, etc.
- Conceptual verification of links has preceded the current phase, as such minimizing the amount of irrelevant links. Functional verification of each link would not only cause significant delays, it might even turn out to be impossible, when not supported by the linked system.
Exploiting this approach and properly designing the procedures to resolve the links can lead to features that are appealing instead of frustrating to end-users, as can be seen from the following examples:
- The procedures try to resolve links as accurately as possible, given the number of parameters that are available in the link-source. Typically, linking from a record in an abstracting database enables the extraction of ISSN, publication year, volume, issue and page information. Depending on the accuracy of the link-to-syntax provided by the primary publishers system, this can lead directly to the full-text of the referred paper. This is rarely the case, since the best that most link-to-syntaxes enable is linking to the appropriate table of contents, from which a link to the full-text can be chosen. This is not necessarily a disadvantage, since it brings some serendipity into the mechanism.
- Quite frequently, not all of the parameters defined for a linking procedure are available. Either it is impossible to extract them from the link-source, or the data is just not there. Typically, as is the case in many libraries in Europe, in the Ghent environment OPAC records do not contain volume and issue information for serials. Therefore, the procedures have been designed to make the best use of the information that is available and to lead the user as close as possible to the goal intended by the chosen link. If issue information is missing, the procedure tries to construct a URL to the level of a volume; if volume is missing too, a URL to the publication year of the reference will be generated; if that is missing too, a link to all electronically available years for the cited journal is the solution. This approach has been turned into an appealing interface feature. For those procedures that require more than just ISSN or ISBN information for full resolution, the remaining parameters are displayed in editable text boxes when the link is presented to the user. Thus, the user can change or complete the form. This feature is especially relevant when working from the OPAC, linking into Current Contents or a full-text collection (Figure 5 and Figure 6).
- UMIs SiteBuilder link-to syntax does not allow use of publication year, volume nor issue as search terms, and therefore the L-BPO procedure has been designed to search for a combination of an ISSN number and title words instead. Although this might be seen as far from optimal linking, it can lead to pleasant surprises in the search results.
Figure 6: a SFX-link to Current Contents for the OPAC serials record
Figure 7: the SFX screen for an OPAC book record
Figure 8: the SFX-link to Amazon.com followed for the OPAC book record
Figure 9: the SFX screen for a record from the EconLit database
Figure 10: the SFX-link to Journal Citation Reports followed for the record from EconLit
Conclusion
Part 1 of this article discussed a framework for the field of reference linking. SFX was presented as an example of how these concepts could be realized in an actual hybrid library. This second part has gone into greater detail in describing the prototype implementation of SFX. This prototype has demonstrated that dynamic linking is a practical and flexible option. It has also shown that the uncertainty inherent in dynamic linking is not necessarily a disadvantage, so long as it is recognized by the user interface. In a large-scale digital library, reference linking can never expect to be fully determined. SFX, with its just-in-time approach to dynamic linking, provides a straightforward, scalable alternative.
References
Bates, Marcia J. 1998. Indexing and access for digital libraries and the Internet: Human, database and domain factors. Journal of the American Society for Information Science 49, no. 13.
Hamilton, Feona J. 1998. Multi-level linking technology by Swets. Information World Review, no. 142 (December).
Van de Sompel, Herbert. 1991. Heading towards an electronic library: location independent integration of electronic reference sources in library workstations. 10th Annual meeting of the Dobis/Libis User Group. Leuven: Dobis/Libis User Group Secretary.
Van de Sompel, Herbert. 1993. Optimalisatie van de konsultatieketen aan de Universiteit Gent. Bibliotheekkunde 51. Kris Clara and Julien Van Borm. Antwerpen: VVBAD.
Van de Sompel, Herbert. 1994. Technology and collaboration: creating an effective information environment in an academic context. Online Information 94. Proceedings of the 18th International Online Information Meeting. Oxford and New Jersey: Learned Information (Europe) Ltd.
Copyright © 1999 Herbert Van de Sompel and Patrick Hochstenbach
Top | Contents
Search | Author Index | Title Index | Monthly Issues
Previous story | Next story
Home | E-mail the EditorD-Lib Magazine Access Terms and Conditions
DOI: 10.1045/april99-van_de_sompel-pt2