Abstract -- GILS starts from a basic premise that the Global Information Infrastructure should enhance the free flow of information. The initiative primarily addresses the ways in which characteristics of information resources are exposed to searching, a fundamental aspect of information infrastructure that has policy as well as technology implications. With GILS, all manner of social, political, economic, and thematic organizations of information exist independently, yet their diverse information locators are interoperable for searching. GILS adopts existing open standards that achieve this interoperabilty by allowing reference to common semantics for characterizing information resources. Sensitive to the world's many languages, as well as legal and financial issues, GILS adopts the ANSI Z39.50 standard to specify how electronic network searches can be expressed and how results are returned. GILS-compliance is a particular way in which servers support searching for the characteristics of any kind of information, at any level of aggregation. Such "locator records" may be hand-crafted as in traditional cataloging, or may be generated dynamically. GILS-compliant servers must recognize the numbers for certain registered elements and search attributes, but any specific set of locator records may use any or none of these, or may use locally-defined elements. By applying a long-standing international standard, GILS takes advantage of existing networks and software already providing access to a vast array of very valuable resources. Although GILS will certainly evolve radically over the coming decades, some basic principles should remain constant, including: adoption of open standards; support for international use; support for diverse points of view; and preservation of access to accumulated knowledge.
In common with many other initiatives worldwide, the Government Information Locator Service (GILS) has a simple goal: make it easier for people to find information.
Obviously, this is not merely a technological challenge. The Global Information Society now taking form will have deep implications worldwide for many decades to come. William Y. Arms, with the Corporation for National Research Initiatives, has written: "Early networked information systems were developed by technical and professional communities, concentrating on their own needs....The digital library of the future will exist within a much larger economic, social and legal framework." [ 1 ]
I think we are at a rare moment in history where specific public policy and technology choices today can have profound and widespread influence. And, no choices are more crucial than the mechanisms by which people find information.
The ideas behind the Government Information Locator Service have many roots. One of these is ongoing work concerning data management for Global Change research, which focuses on topics such as Global Warming and Loss of Biological Diversity. These are aptly named "Grand Challenges", especially as it has now become obvious that Earth systems are intrinsically entwined with many other kinds of systems--social, economic, political, etc.
One of the fundamental realities in this arena is that researchers and policy-makers today and in the future will need access to long baseline data. There is a fundamental need for continuity not only among world data centers, but with the vast treasure-houses of literature and other information maintained by libraries, museums, and archives worldwide. Current and future records must also be managed appropriately, but it is fundamentally not possible to fully anticipate what data and information will be needed. In some areas of basic research today, it may be 40 years until societies know the right questions to ask and what crucial data and information should have been maintained.
Given this context, a reasonable question to ask is this:
What should we do today to influence the global information infrastructure so as to maximize the accessibility of relevant data and information worldwide for decades to come?
Among science organizations and throughout democratic societies, there is a basic premise that information infrastructures should enhance the free flow of data, information, and ideas. Yet, it is also recognized the global information infrastructure will be built primarily to serve interests other than long-term Earth systems monitoring and natural resources management. If the free flow of ideas is to be enhanced in a way that is sustainable over the long term, it will be through finding common solutions with commercial and entertainment interests rather than through government fiat.
A great strength of the emerging Global Information Society is the wide diversity evident in the many separate but overlapping perspectives and authority regimes. At this moment in history, virtually all nations appreciate that they should not dictate to other nations how to manage information, and that highly centralized approaches are often dangerous even from a strictly operational perspective. It follows that any information standard intended for global use should not presume central authorities or a master index of information. Rather, global information standards need to embrace interoperable but very decentralized approaches.
Although more complex to design, such interoperable and decentralized approaches are entirely feasible. Common standards employed in bibliographic cataloging provide a measure of interoperability across independently maintained libraries, while allowing wide latitude in how collections are developed and organized. Anyone can use the catalog standards to any extent desired and for any purpose whatsoever. Open standards from the library and information services communities have now evolved to take advantage of public networks such as the Internet. Happily these international standards are sensitive to the world's many languages, and to legal and financial issues such as copyright, security, privacy, and payments.
In accepting this challenge to influence the Global Information Infrastructure in a fundamental way, one must accept the bare fact of the currently primitive facilities for dealing with human knowledge in its myriad contexts. People have barely begun to understand how to handle complex information at the personal and corporate level. Much less clearly can ways be seen to facilitate the exchange of knowledge or to model the interactions of information organizations among whole societies.
Continuous evolution and drastic revolutions must be expected. Yet, there are immediate strategies that seem worthwhile to pursue. While access mechanisms such as user interfaces, data base technologies, and network protocols change at a frequency of every few months, certain fundamental aspects have persisted for many decades and continue to carry long term implications. For example, people searching for information resources typically need clues regarding: What is it about? Who created it?, When was it published?, Where is it available?, etc.
As technology, GILS primarily addresses one fundamental aspect of information access--the ways in which the characteristics of information resources are exposed to searching. It happens that this technological approach is also useful in the context of policy initiatives intended to promote diversity of information access without sacrificing coherence.
One definition of GILS is "A decentralized collection of locators and associated information services used by the public either directly or through intermediaries to find information." [ 2 ] This definition highlights the role of intermediaries and expresses the decentralized nature of GILS, but in some ways it begs further elaboration.
Service The word "service", for example, carries different meanings depending on whether GILS is discussed in the context of policy, technology, or standards. For policy people, GILS can be viewed as a "service" in the sense of a set of human, organizational, and technological facilities that help people find information. Extending this policy sense to technology, implementors sometimes refer to GILS not as a service but as a specific system built of software and utilized in support of the policy goals. To my mind, such systems are examples of compliance with GILS, just as a particular building may be an example of compliance with a building code.
For the Digital Libraries audience, I would like to focus on the precise meaning of GILS in the standards context. In the general architectural model of networks, GILS is a service in the sense of an application profile that is part of a service definition. This service performs specific functions useful for locating information. It is available for use by higher level applications and makes use of lower level components such as bitways.
Locator A locator is defined as an information resource that identifies other information resources, describes the information available in those resources, and provides assistance in obtaining the information. It is typically modeled as a database of locator records, each of which is a set of related data elements descriptive of various characteristics of an information resource.
Depending on how they are created and the nature of the information resource, locator records can be known by many other names: metadata, meta-information, directories, catalogs, abstracts, etc.
Outside of a specific policy context, there is no prescription for what is an appropriate level of aggregation. GILS locator records already exist for everything from individual pamphlets to multi-national programs. North Carolina is using GILS locator records to describe individual fields within databases throughout the state. The Government Printing Office has created GILS locator records at the level of entire Federal agencies.
Servers acting as GILS locators are also information resources and can themselves be described by a GILS locator record in other GILS locators. Using the Linkage element, the separate GILS locators can be exploited as a network for distributed search, and traversal of that network can be informed by any of the resource characteristics described (e.g., Subject Terms, Originator, Distributor Name, Access Constraints). This recursive feature should make GILS useful in attempts to solve the "query routing" problem of Internet searching.
Information It is a common misconception that GILS locator records are merely another variety of "META" tagging for Internet information resources. While GILS locator records are quite useful for networked information resources, GILS locator records are designed to act as pointers to ALL kinds of information including people, organizations, events, artifacts, etc. These and countless other sources of information worldwide are not now, and in some cases will never be, either electronic or network accessible. GILS locator records must have a representation on electronic networks; but the referenced information resources need not.
The "G" word When told that the "Government" Information Locator Service was also intended to serve as a "Global" Information Locator Service, the late Paul Evan Peters remarked that the same "G" would continue to be useful when GILS becomes the "Galactic" Information Locator Service. Although GILS is being implemented in multiple countries and so might be considered "international" if not "global", it is also intended to be "global" in the sense of extensible to various other kinds of information. (Sebastian Hammer once suggested that GILS could be taken to stand for "Generic" Information Locator Service.)
There are multiple ways in which GILS is surfacing in law, public policy, and regulation. At the U.S. Federal level, there is public law ( 44 USC 3511 ), an OMB Bulletin ( 95-1 ), and a NARA Guideline. Closely related to the U.S. Federal GILS is an Executive Order ( EO 12906 ) that established the National Spatial Data Infrastructure. There is also the Data Access System Guideline promulgated by the Committee on Environment and Natural Resources and various national initiatives such as the National Biological Information Infrastructure, the Geospatial Information Infrastructure, the Geospatial Data Clearinghouse, and the National Environmental Data Index. The Library of Congress has committed to make its products and search engine GILS-compliant. The recent strategic plan for the National Archives and Records Administration refers to GILS. There are also under development at OMB some policies with regard to World Wide Web publishing and to the new Electronic FOIA Act. At the Federal agency level in the U.S., there are various directives and other policy instruments, such as those in the Department of Treasury, and the Department of Defense. There is also now a requirement that all 1,380 Federal Depository Libraries nationwide provide GILS-aware client software for public use.
Also in the policy arena, there are various U.S. state law and policy initiatives. North Carolina has a new law regarding requirements for indexing state and local databases and an executive order to implement a NC GILS pilot, which may build upon the required indexes. Several other states and regional organizations are pursuing GILS for their specific purposes, including Florida, Massachusetts, Missouri, New York, South Carolina, Tennessee Valley Authority, Texas, and Washington. There are even more state and city participants in the National Spatial Data Infrastructure. There is also an economic development initiative using GILS that is sponsored by the Southern Growth Policy Board under the regional association of governors.
On the international front, GILS is being implemented in Japan. There are also GILS initiatives underway in Australia and Canada. Canada and the U.S. are also cooperating in the Great Lakes Information Network.
There is a strong momentum toward GILS in the area of environmental information. Through a G7 Global Information Society initiative, there is now consensus among the G7 countries, plus a few others, to employ GILS as infrastructure for a global environmental information locator. This infrastructure will link other efforts such as the Clearinghouse Mechanism for the Convention on Biological Diversity and the European Environment Agency Catalogue of Data Sources.
A great diversity of application contexts is an intentional design feature of GILS. All of these disparate social, political, economic, and/or thematic organizations and other entities are free to independently organize relevant parts of information space for their own particular needs. They do not need to subscribe to a single view of information, nor do they need to subordinate their locator to some "mother of all GILS". Although their disparate goals lead to various usage guidelines, the one thing they have in common with all other GILS is that their locators are interoperable for searching.
Interoperability among the many different GILS-compliant servers is defined through the GILS Application. A profile is "The statement of a function and the environment within which it is used, in terms of a set of one or more standards, and where applicable, identification of chosen classes, subsets, options, and parameters of those standards. A set of implementor agreements providing guidance in applying a standard interoperably in a specific limited context." [ 3 ]
The GILS Application Profile is an Implementors Agreement developed and coordinated internationally through the Open Systems Environment Implementors Workshop (OIW), a "de jure" standards process. (Following its acceptance as an international profile, the GILS Application Profile was also adopted as U.S. Federal Information Processing Standard (FIPS) Publication 192.)
The specific group that manages the GILS Application Profile is the OIW Special Interest Group on GILS (OIW/SIG-GILS). The OIW/SIG-GILS meets approximately once a month. Meetings are open to anyone and meeting summaries are posted under "GILS Discussions" at http://www.usgs.gov/gils/gilscopy.html.
The GILS Application Profile is defined only in the context of peer networks, such as OSI and TCP/IP. The profile assumes a client-server architecture. The GILS Application Profile does not constrain clients at all, but is designed to anticipate clients acting as automated agents (i.e., the client's human is not monitoring the search session as it occurs). End-user clients, gateways, and agents have free rein to interact with GILS servers in either a stateless or stateful manner.
The GILS Application Profile only specifies the behaviors of server software in conversation with client software. The profile specifies server behavior in terms of an abstraction layer--the service definition is independent of how the content is actually managed at the server.
A GILS-compliant server appears to a client as though holding a searchable set of information locator records. Each locator record can characterize other information of any kind, at any level of aggregation. These may be hand-crafted catalog records or on-the-fly products of an automated abstracting or classification process. For example, a locator record that describes another server might include a listing of the words most characteristic of that server's contents and so act as an intermediary resource for information discovery.
Searches can be content-based using full-text searching or some other manner of feature extraction. Or, the search may take advantage of registered attributes such as structured elements (e.g., Title, Author, Subject) and relations (e.g., Equal, Greater Than).
In GILS, there are defined about 70 registered attributes to control searching, with explicit semantics for each element as it is used to characterize information resources. Another 100 registered attributes are available because they are inherited from the Z39.50 Bib-1 Attribute Set. GILS locator records can incorporate locally-defined elements as well. GILS-compliant servers must be capable of searching with the required attributes and handling all of the registered elements. However, specific sets of locator records available from a GILS-compliant server may have any number of these or other elements, and may also have none at all. (Records without any element structure are searched full-text through a special attribute called "Anywhere".)
Because GILS provides interoperability through reference to the registered semantics of elements, it is not necessary to specify a canonical format for structured metadata. Natively or through gateways, the service can support interoperable search of many different metadata structures--HTML, SGML, X.500, SQL databases, PURL's, Handles, Dublin Core, SOIF's, IAFA, Internet mail, DIF's, Whois++ templates, spatial metadata, etc. Whenever appropriate, servers simply map local semantics to the registered elements. (In Z39.50 version 3, there is an "Explain" facility by which clients can dynamically discover what attributes are available.) Multi-lingual searching is also facilitated because the elements are referenced by number-- user interfaces simply translate the number to whatever is appropriate for the particular language chosen.
The GILS Application Profile has a mixture of components from the Internet, such as URI's and MIME types, as well as OSI components such as ISO 10163 (ANSI Z39.50). There is a special RFC that specifies how the OSI functions inherent in Z39.50 get mapped onto the functions available in TCP/IP.
The Z39.50 standard allows different information databases to be interoperably searchable. (In 1994, 16.5% of public libraries serving populations over 100,000 were using Z39.50; in 1995 this increased to 23.8%.) Z39.50 specifies how electronic network searches can be expressed and how results are returned. As stated by Sebastian Hammer of Index Data in Denmark and John Favaro of Intecs Sistemi in Italy: "...the essential power of Z39.50 is that it allows diverse information resources to look and act the same to the individual user. At the same time, it allows each information system to assume a different interface for every user, perfectly suited to his or her particular needs. " [ 4 ] (For an overview discussion of how Z39.50 complements the World Wide Web, see http://www.usgs.gov/gils/webz3950.html.)
The GEO profile for the Geospatial data locators is currently under development and is essentially a superset of GILS. The new Catalog Interoperability Profile being tested by the Committee on Earth Observing Satellites is designed to be compatible with GILS. Also, the profile being developed for Digital Collections (previously known as "Digital Libraries") is compatible with GILS semantically, and the same should be true of the forthcoming profile for the interchange of information among museums worldwide. (For further information on Z39.50 Profiles, see http://lcweb.loc.gov/z3950/agency/profiles.html ).
The semantics of GILS metadata elements are also a major component of the recent Internet search protocol proposal developed for Web "metasearcher" companies under the Stanford Digital Library project.
It is common experience that many separate communities use bibliographic techniques to characterize data and information resources. Unfortunately, as each community chooses different tags for the bibliographic or metadata elements, any commonality that may actually exist gets completely obscured. The usual result is that there is no functional interoperability between the catalog services, unless some organization is able to force the communities to accept imposition of a common format. Such strong-arm tactics may be useful at times, but they are clearly inappropriate on a long-term and global scale.
At its center, the thrust of GILS from a standards perspective is to encourage interoperability at the semantic level. All information resources should not be characterized in the same way--the appropriate searching characteristics should be a function of the information itself but also of the cataloger's purpose and the searcher's needs. Again Bill Arms provides a useful conceptual reference: "Digital objects are the basic building blocks of the digital library, but users of the library usually want to refer to items at a higher level of abstraction...Which digital objects should be grouped together can not be specified in a few dogmatic rules. The decision depends upon the context, the specific objects, their type of content and sometimes the actual content." [ 5 ]
As one example, for some combinations of resource, cataloger, and searcher, the semantics of "system name" could be equivalent to "title", "principal investigator" could be equivalent to "author", and "keyword" could be equivalent to "subject". In that case, there is clearly a ripe opportunity for interoperability. With such a semantic mapping, the European science community's Catalogue of Data Sources could be interoperable with the library community's OPAC (Online Public Acccess Catalog) Network- Europe
The library and information services community has always been concerned with such interoperability problems. Over many decades, they developed a variety of schemes such as the Dewey Decimal System, the Library of Congress Subject Headings, Machine-Readable Cataloging (MARC), the Anglo-American Cataloging Rules (AACR), and the Z39 family of information standards.
The theory and practice of information science is quite distinct from computer science and database technologies, and unfortunately the approaches to interoperability have been quite different. In relational data bases, for example, one foray into semantic interoperability was the Integrated Resources Dictionary System. A common semantic registry is also a feature of the Electronic Data Interchange standard and of the X.500 family of standards.
Now that so many people are cross-training in both the information science and computer science fields (witness the Digital Libraries initiative, for example), perhaps the time is now ripe for enhanced semantic interoperability approaches. I am aware of relevant work at the Distributed Systems Technology Centre in Australia on a CORBA Type Management Service, and a Catalog Services Request for Information in the context of the OpenGIS Abstract Specification. Although there may not yet be sufficient momentum to deploy such a comprehensive architecture, it may be timely to explore semantic interoperability between the Z39.50 standard and mechanisms such as the Common Indexing Protocol, Resource Description Messages, and the Platform for Internet Content Selection.
By applying an existing standard in wide use for many years, GILS takes advantage of existing networks and software to access a vast array of valuable resources, including hundreds of libraries, as well as museums and archives worldwide. Together with spatial data catalogs, these professionally maintained resources provide free access to information resources collectively valued in the tens of billions of dollars, with even more information available on a fee basis from commercial information services.
GILS-compliant server freeware is available from the Clearinghouse for Networked Information and Discovery in North Carolina and from IndexData in Denmark. Blue Angel Technologies is offering in beta release a new freeware GILS-compliant server for Windows NT and a companion Meta Data Manager for handling metadata in ODBC-compliant databases. (Other facilities for GILS metadata collection are also available.)
WAISserver commercial software was recently acquired by Fulcrum Technologies, which will distribute a GILS-compliant release and will also integrate GILS-compliance into Fulcrum Surfboard. The Online Computer Library Center SiteSearch server is being made GILS-compliant in support of the Solinet Public Information Project. GILS-compliance is also being developed by Verity, SIRSI, and Sovereign-Hill. (The Sovereign-Hill search engine is a commercialization of a product of the Center for Intelligent Information Retrieval, called Inquery.) Elsevier is looking at GILS for shared search access to science publications through their new Science Direct service.
Distributed search facilities that include GILS-compliant servers are being developed for Europagate, and the Environment and Natural Resources Management project, a well as the Government Printing Office and FedWorld. Distributed search of the Global Change Master Directory is in prototype for a facility of the International Directory Network overseen by the Committee on Earth Observing Satellites. Australia's MetaSearch pilot project provides a powerful demonstration of how Z39.50 servers can be searched in combination with WAIS sources and the indices built by Web crawling.
Because the GILS Application Profile has no user interface, access to GILS-compliant servers must be accomplished through gateways, clients, or agents. Gateway freeware and toolkits are available for World Wide Web and for X.500. An interesting technology proposal is the building of interoperability between GILS and the X.500 directory that underlies the U.S. Government-wide Electronic Directory. This government-wide directory is intended to include White Pages (employees), Yellow Pages (organizations), Blue Pages (services), and Green Pages (documents). Once there are two-way gateways for Z39.50/GILS-compliant servers and these X.500 directory databases, locator information in both schemes would be readily accessible from the access mechanisms of either.
Any client software capable of access to a server compliant with Z39.50 version 2 or 3 can access GILS- compliant servers. Extra capability is provided by GILS-aware clients such as the BookWhere 2000 product from Sea Change Corporation, and products under development such as SIRSI Vizion and the new Java client being developed by Blue Angel Technologies.
The GILS application profile specifies, though it does not mandate, spatial searching by a "bounding box" of latitude, longitude pairs. This extension of search beyond text is an important feature of Z39.50. It has allowed the protocol to be applied to searching for chemicals by bond angles and for the searching of gene sequences. Other kinds of pattern-matching can be supported as well.
To achieve a tighter integration of GILS with World Wide Web servers, the USGS crafted a search module for the freeware Apache Web server. This module embeds into the Web server the Indexdata freeware indexer, gateway, and client components. Without requiring dual administration and security, the GILS-compliant server is accessible in a presentation mode through HTTP and in a search protocol mode through Z39.50.
The same approach can be applied to the new freeware AOLserver, as well as the Netscape Commerce Server,the Microsoft Internet Information Server, and others. The AOLserver looks especially interesting since it includes the Informix Illustra database management system. Illustra is the commercialized form of the PostGres object-oriented database management system. The USGS has already demonstrated porting of the Indexdata GILS-compliant protocol and server freeware to PostGres, so it should be straightforward to provide a GILS-compliant Illustra DataBlade. It is a major positive development to have an object-oriented relational data base management system integrated with a freeware Web server. And, it is clearly a great advantage that Illustra already supports spatial search facilities.
As a long-term infrastructure initiative, it is clear that GILS will be evolving in many ways. Yet, I think there are some basic principles from which GILS should not diverge over the decades:
We cannot today predict what will be the appropriate technical basis for GILS over the long term. Evolutionary and sometimes revolutionary changes must be expected and accommodated. In the current revolutionary turmoil, the technology strategy must be flexible but careful not to compromise any of the basic principles.
Contacts for
information and discussion related to GILS in areas of policy, services,
standards, technology, and other aspects.
Search WWW pages within four
hyperlinks of Global Information Locator home page
The Government
Information Locator Service (GILS): Access Through Technology by Robin Murphy, InterNIC
News
Government Information Locator Service (GILS) home
page
What is GILS?
Global Information Locator home page
Metadata Resources, International
Federation of Library Associations and Institutions, The Hague, Netherlands
hdl:cnri.dlib/december96-christian