D-Lib Magazine, July/August 1996
1). What is available in the testbeds for experimentation
among the Digital Library Initiative (DLI) members?
The UIUC DLI project will be able to provide to fellow
members of the Digital Library Initiative access to its Testbed,
which currently holds the full text of SGML documents from selected
engineering journals. These journals, which number 20 at the present
time, are being furnished by the American Institute of Physics
(AIP), the American Physical Society (APS), the American Society
of Civil Engineers (ASCE), IEE (Institution of Electrical Engineers),
IEEE (Institute of Electrical and Electronics Engineers), IEEE
Computer Society, and the American Society of Agricultural Engineers
(ASAE). Over the next year, journals from the American Association
for the Advancement of Science (AAAS), the American Institute
of Aeronautics and Astronautics (AIAA) and John Wiley & Sons
will be added to the Testbed.
The UIUC DLI is utilizing
OpenText Corporation's
OpenText Index Search engine. The search engine, which indexes
and accesses DLI project documents, is an extremely robust and
expandable system that allows phrase, Boolean, and proximity searching,
and is tailored to SGML processing and retrieval. One of the cornerstones
of the distributed repository model is the separation of metadata
and index data from the full-text articles themselves, which allows
efficient retrieval from subsets of the full-text repositories.
To this end, the UIUC is working towards identifying the optimum
bibliographic metadata to assist in full-text retrieval. The metadata
elements presently being specified include: URL, basic bibliographic
information, author information (including affiliation), abstract,
figure and table information, and bibliography information (including
all citations).
The UIUC DLI is collaborating with the University of Michigan
DLI Interface team to incorporate Search Tree retrieval techniques
first developed by Karen Drabenstott for the ASTUTE Project into
the UIUC Testbed interface design. These techniques serve as search
navigation aids, by suggesting to the user various modifications
that will broaden or narrow search results, addressing the frequent
problems users encounter with too few or too many retrievals.
This collaboration will provide a means for testing the efficacy
of various interface designs across several DLI projects.
Semantic concept space technology, which utilizes
co-ocurrence term matrices, will be transferred to the University
of California at Santa Barbara DLI. The UIUC DLI focuses on methods
that interactively provide the user with conceptual maps that
offer alternative search terms.
(2) What is available in the testbeds for experimentation among other groups?
Due to the nature of the legal agreement between
the UIUC DLI and our partners, the UIUC Testbed will not be available
for experimentation among other groups outside of the DLI until
the 3rd and 4th years of the grant (1997-1998), at which time
access will be provided to the CIC (Committee on Institutional
Cooperation) community, which is comprised of: the University
of Chicago, The University of Illinois, Indiana University, the
University of Iowa, The University of Michigan, Michigan State
University, University of Minnesota, Northwestern University,
Ohio State University, Pennsylvania State University, Purdue University
and the University of Wisconsin at Madison.
DLI research involving concept spaces for scalable semantic retrieval will be shared with semantic researchers, e.g. with the group at Rutgers University. The large collections generated by supercomputers are available as well, e.g. concept spaces across 1,000 areas of engineering.
The UIUC DLI project is emphasizing the collaborative
effort needed between university researchers and our publishing
partners. A large part of the project will be to look at the future
roles of publishers, libraries, A & I services, and authors,
and to extend the model of distributed publisher repositories
developed in the DLI project. We now have a viable retrieval system
and are in the process of setting up a framework for a system
of federated repositories The DLI will be in a position to advise
our publishing partners in setting up their own repositories and
we will continue to work on the research needed for effective
indexing, retrieval and display of the full-text of scientific
and engineering journals.
(3) What are the plans for the testbeds at the end of the project?
The University Library at the University of
Illinois is developing an industrial partnership program to augment the
research and development work going on in the DLI project, and
to extend the Testbed past the four year NSF/ARPA/NASA grant period.
In addition to continuing in the collection and maintenance of
large-scale digital collections, we hope to expand the scope and
breadth of the Testbed to include other full-text journals, preprints,
and conference papers. The UIUC Library hopes to establish a collaborative
environment with DLI publishing partners in order to more fully
explore the potential of SGML retrieval and display and the distributed
repository model, linking to other digital projects, including
image databases, numerical databases, and A & I service databases.
We expect that during the course of the DLI project many of our publisher partners will create their own repositories. The DLI role will continue to emphasize the gateway or front-end access to multiple information repositories, including full-text repositories. This will help the Testbed evolve into a multiple-view reference system to a federated collection of distributed repositories. The repository management package will let other organizations and individuals make their organized collections searchable via a multiple-view interface.
The UIUC DLI project is providing major input to
the next-generation server that
NCSA is building. The server will
move from a WWW document server using HTTP to a distributed repository
host using multiple protocols. The server version 2.0, due the
summer of 1996, will feature a modular protocol design and integrated
security. We will later incorporate the work on stateful gateways
into the server on the output end. The input end will incorporate
the work on collection development, so that the new server will
eventually support session history and metadata checking. Later
versions will also support security measures such as token passing.
(4) What can be done now to anticipate the continued operation of the testbeds beyond the current project?
Industrial technology transfer between DLI partners and other companies is key to the advancement of information retrieval for large collections. The structure of the DLI projects, with large testbeds and many partners, is set up to encourage technology transfer of new developments. For effective search of multimedia objects across multiple repositories on the Web, software developed by the research teams must be shared with companies with the means to advance research.
To expand and continue the research efforts, such as the scalable semantics research, additional funding for research must be available. This might take the form, for example, of a follow-on initiative to the DLI in another area of information management, such as analysis environments.
hdl://cnri.dlib/july96-harum