Stories

Mapping and Converting Essential Federal Geographic Data Committee (FGDC) Metadata into MARC21 and Dublin Core: Towards an Alternative to the FGDC Clearinghouse

Appendix B: FGDC Metadata Elements

Discussion of Essential FGDC Metadata Elements

The following notes on the crosswalk are based on our cataloging experience as participants in the CORC project. The crosswalk also incorporates results of the Seventh Dublin Core Workshop (DC-7), which was held October 25-27, 1999, at Die Deutsche Bibliothek, Frankfurt, Germany. The principal activities at DC-7 were DC Working Group discussions of proposals for an initial set for DC Qualifier semantics. Discussion, revision, and approval of DC qualifiers will be completed by January 1, 2000 [Weibel 1999]. We intend to harmonize DC qualifiers in our converter with those of the Dublin Core Metadata Initiative after that date.

FGDC Title

This FGDC element corresponds to MARC21 245 $a and DC.Title. We follow CORC participants' practice of using subfield $h [electronic resource] rather than $h [computer file] as the general material designation (GMD). The former is valid under ISBD (ER) rules, although it is not yet approved in AACR2. The converter enters $h [electronic resource] after the 245 $a title field.

FGDC Originator

This FGDC element corresponds to MARC21 720 and DC.Creator. The 720 field occurs in the earlier (1997) but not the current (1999) version of the Library of Congress crosswalk [LC 1999]. We are using this field because the converter program cannot distinguish between personal and corporate names. Since the 720 field is not used in WorldCat or CORC, librarians will have to edit this field to the appropriate personal (700) or corporate (710) added entry fields. The DC Agents Working Group at DC-7 proposes the qualifier "Agent Name" for Agent elements (that is, the DC.Creator, Publisher, and Contributor elements) [Iannella 1999]. The crosswalk accordingly maps the FGDC Originator as DC.Creator.Name.

FGDC Publication Date

This FGDC element corresponds to MARC21 260 $c and DC.Date. Both FGDC and DC metadata adhere to the date format of the ISO 8601 standard, which has been summarized by Kuhn [Kuhn 1999]. For example, the date November 25, 1999 is given in FGDC as 19991125 (or YYYYMMDD without hyphens) and in DC as 1999-11-25 (or YYYY-MM-DD with hyphens). The DC Date Working Group recommends hyphenation, which is also endorsed by the World Wide Web Consortium (W3C) [Childress 1999; Wolf 1997]. The Working Group also recommends the qualifier "Issued" for the "date of formal issuance (e.g., publication) of the resource." The converter also maps the YYYY-MM-DD format to the MARC21 260 $c field. Users of WorldCat and CORC will want to edit the date, e.g., 1999-11-25, to "1999" in MARC 21 260 $c, add "November 12, 1999" as a 500 note, and enter "e" (for expanded date), and "1999,1112" in the Date/Status (DtSt) and Dates fixed fields.

FGDC Calendar Date, Beginning Date, and Ending Date

These FGDC elements refer to the time period covered by the data set, either as a single date or range of dates, and not to the publication date. In MARC21, the date, or dates, map to a 513 $b Period Covered Note. If there is a range of dates, the converter repeats subfield $b twice (once for each date). For the DC.Coverage element, the joint meeting of the DC-Date and DC-Coverage Working Groups at the 7th Dublin Core Workshop has recommended the qualifiers "Date", "dateStart", and "dateEnd" [Miller 1999]. The crosswalk therefore maps FGDC Calendar_Date, Beginning_Date, and Ending_Date, respectively, as Coverage.Date, Coverage.dateStart, and Coverage.dateEnd.

FGDC Bounding Coordinates

The four FGDC elements for west, east, north, and south bounding coordinates map to MARC21 034 $d, $e, $f, and $g and to DC.Coverage. The DC.Coverage Working Group at DC-7 is proposing the qualifier "Box," which will contain westLimit, eastLimit, northLimit, and southLimit for FGDC bounding coordinates [Miller 1999; Cox 1999]. For example, using the bounding coordinates of the data set "Hydrologic units maps of the Conterminous United States," the four FGDC bounding coordinates are mapped as DC.Coverage.Box.westlimit:-127.9061; eastlimit:-65.3219; northlimit:48.2825; southlimit:22.8757.

The MARC21 043 $d, $e, $f, and $g subfields (and 255 $c subfield) express longitudes and latitudes in degrees-minutes-seconds (DMS), while the FGDC values for bounding coordinates are given in decimal degrees (DD). MARC21 records ought to be edited with a DD/DMS converter such as the one provided by Gary Park of the University of Northumbria [Park 1999]. We intend to add a DD/DMS (and DMS/DD) converter to our metadata converter at a future date.

FGDC Theme Keywords

These FGDC elements correspond to MARC21 653 (Index Terms--Uncontrolled) and DC.Subject.Keyword, not to MARC21 650 (Subject Added Entry--Topical Term) and DC.Subject.LCSH. This is because creators of FGDC metadata use their own keywords, not a controlled vocabulary like the Library of Congress Subject Headings (LCSH) that many libraries use in MARC21 650. Some users of the converter may prefer this uncontrolled vocabulary, but we prefer to use LCSH where possible. There is an NBII thesaurus (currently containing about 1000 uncontrolled terms), and we have started working on a crosswalk between it and LCSH. This crosswalk can be built into the converter. We think about half of the NBII terms can be mapped directly to LC subject terminology. For clarification, the thesaurus we started mapping to LCSH is a version of the keyword thesaurus incorporated into the USGS MetaWebber tool <http://biology.usgs.gov/pubs/metaweb>. However, we have recently learned that the the CERES/NBII Thesaurus Partnership Project <http://ceres.ca.gov/thesaurus/> is a more comprehensive and authoritative thesaurus, so we intend, if possible, to include that thesaurus in our converter instead.

FGDC Place Keywords

These FGDC elements correspond to MARC21 653 and DC.Subject.Geographic. The converter does not map to MARC21 651 (Subject Added Entry--Geographic Term) because place keywords (like theme keywords) used by creators of FGDC metadata are uncontrolled terms. Some users of the converter may prefer these terms, while others (like ourselves) may prefer to edit them to forms found in the OCLC Authority File.

FGDC Format Name

This FGDC element corresponds to MARC21 856 $q and DC.Format. Creators of FGDC metadata may use a list of domain values given in the "Content Standard for Digital Geospatial Metadata" and some libraries may prefer to use this value. (The list of values is given at element 6.4.2.1.1 in [FGDC 1998]). Our preference is to use Internet Media Types (MIME values), which are also recommended by the DC Resource Type and Format Working Group. For example, many FGDC data sets are available in Arc/Info export file format, which has the FGDC domain value "ARCE." However, we use the Arc/Info extension ".e00" to create the local format "application/e00" (it is local because, so far, we have not found Arc/Info in any MIME list).

FGDC Contact Information

These twelve elements in the FGDC Contact template are mapped to the MARC21 270 primary address field. As mentioned above, the Contact template (section 10) can be inserted into the Identification, Distribution, or Metadata Reference sections (sections 1, 6 or 7). Contact persons or organizations may be the same or different in these three sections. We mapped contact information about the originator of the data set, rather than about the distributor contact or metadata contact, because we felt that this would be of most interest to users of FGDC data sets. For the time being, the crosswalk does not map FGDC Contact information to Dublin Core. This will be done when the basic set of qualifiers for DC Agent is established.[1]

DC Identifier/MARC21 856

The converter does not include mapping of a Uniform Resource Locator (URL) from FGDC to the DC.Identifier or MARC21 856 field. This is because the URL is a link to FGDC metadata located at a clearinghouse or other resource center. The URL does not occur in the body of the FGDC metadata itself. (The URL in an FGDC Online Linkage field in the Citation template is an optional link that points to a data set, not to metadata.) This is really a matter of meta-metadata, or data about metadata, and we are interested to learn how other libraries are addressing this issue. It should be emphasized that we do not view the meta-metadata as a replacement for the full record. Our approach is premised on the availability of the full record via a hyperlink from the Identifier (DC) or 856 (MARC) field back to the full FGDC record.

Our intial solution at the EE-IR Center has been to write DC metadata about FGDC metadata that has been placed onto a Web server and offered to users in HTML:

DC Meta-Metadata: Hydrologic Units Maps of the Coterminous United States
FGDC Metadata: <http://water.usgs.gov/GIS/metadata/usgswrd/huc250k.html>

Experience shows that putting FGDC metadata records in a database and offering one search engine interface is not sufficient for user needs. The HTML version of the FGDC record cited in the above example is not unusual. We have found dozens of examples where the HTML version of the record is dressed up and offered as a Web page, making the metadata record -- not a search engine -- the primary user interface (a list of some of these examples is available at <http://eeirc.nwrc.gov/metadata_fgdc.htm>). Huberman and Adamic (1997) contend that "social search" is a mechanism for navigating the bountiful content of the Web. That is, exchanging links to content with one's colleagues and friends is a successful and efficient method that bypasses portal directories and search engines [Huberman 1997]. By exchanging links to known sites of value, groups of individuals reduce the costs of surfing for information. It is this strategy or pattern, we believe, that may explain the emergence of this parallel FGDC Clearinghouse system.

For example, there is a Dublin Core version of an FGDC record in our collection, the title of which is "1:250,000 Scale Quadrangles of Landuse/Landcover GIRAS Spatial Data in the Conterminous United States"; the URL is <http://eeirc.nwrc.gov/metadata/151.htm>. The Identifier field in this DC record points to a full FGDC record at <http://nsdi.epa.gov/nsdi/projects/giras.htm>. The problem is, our colleague Suzanne Harrison has also found the identical HTML view of this FGDC metadata record at two other locations on the Web:

1. <http://www.epa.gov/ngispgm3/nsdi/projects/giras.htm>
2. <http://www.epa.gov/envirofw/html/ndsi/projects/giras.html>. (This resource is no longer available.)

What is emerging out of the FGDC Clearinghouse model, therefore, is a very serious maintenance problem. It is clear from this example that there is no method for updating all the permutations of the "official" FGDC record housed in one of the Clearinghouse server nodes. The complications with maintenance of FGDC metadata created by this divergence have not been studied.

Note for Appendix B

[1.] There has been considerable discussion by the DC Agents Working Group about the kind and amount of contact information that ought to be included in the basic set of DC qualifiers for Agents. Currently, the Working Group favors five qualifiers for Agent Type (person, organization, event, or object), Agent Name, Agent Affiliation, Agent Role (for example, the MARC Code List of Relators), and Agent Identifier (for example, a Uniform Resource Identifier or URI) [Iannella 1999]. Some DC user communities or implementers may want to use additional qualifiers (e.g., for e-mail or physical addresses, telephone numbers, etc.). Still other communities will add elements to the basic fifteen-element DC set or are using Dublin Core in other languages besides English. There are plans for the Dublin Core Directorate to establish a registry that will define and accommodate the specific needs of different users groups.

References: Appendix B

[Childress 1999] Childress, Eric. (1999). "DC Date Qualifiers: DC Working Draft, 07 December 1999." Available at: <http://www.mailbase.ac.uk/lists/dc-date/files/qual-prop.html>

[Cox 1999] "DCBOX: Specification of the Spatial Limits of a Place, and Methods for Encoding This in a Text String" Available at: <http://www.agcrc.csiro.au/projects/3018CO/metadata/dcbox/>

[Huberman 1997] Huberman, Bernard A., and Lada A. Adamic. (1997). "Novelty and Social Search in the World Wide Web." Xerox Palo Alto Research Center. Available at: <http://www.parc.xerox.com/istl/groups/iea/www/request.html>

[Iannella 1999] Iannella, Renato. (1999). "DC Agent Qualifiers: DC Working Draft, 12 November 1999." Available at: <http://www.mailbase.ac.uk/lists/dc-agents/files/wd-agent-qual.html>

[Kuhn 1999] Kuhn, Markus. (1999). "A Summary of the International Standard Date and Time Notation." Available at: <http://www.cl.cam.ac.uk/~mgk25/iso-time.html>

[LC 1999] Library of Congress. Network Development and MARC Standards Office. (1999). "Dublin Core/MARC/GILS Crosswalk." Available at: <http://lcweb.loc.gov/marc/dccross.html>

[Miller 1999] Miller, Eric (1999). "DC Working Draft: 16 November 1999" [Qualifier Proposal for DC Coverage]. Available at: <http://www.mailbase.ac.uk/lists/dc-coverage/files/wd-coverage-qual.htm>

[Park 1999] Park, Gary J. (1998). "Decimal Degrees to Degrees Converter" Available at: <http://www.unn.ac.uk/~evgp1/gary/dec2deg.htm>

[Weibel 1999] Weibel, Stuart. (1999). "Qualifier Approval Timetable". Available on the DC-General mailing list at <http://www.mailbase.ac.uk/lists/dc-general/1999-11/0012.html>

[Wolf 1997] Wolf, Misha, and Charles Wicksteed. (1999). "Date and Time Formats." Available at: <http://www.w3.org/TR/NOTE-datetime>

(At the request of the authors, the URL for the Huberman reference was changed to
<http://www.parc.xerox.com/istl/groups/iea/www/request.html>. This changes was made on 1/18/00 at 10:39 am.)

This work is supported in part by a grant from the U.S. Department of Energy (under grant No. DE-FG02-97ER1220).

Back to story