DISCUSSION PAPER NO. 94

DATE: December 21, 1995
REVISED:

NAME: Proposed Changes to FTP File Label Specifications for Electronic Files of USMARC Records

SOURCE: Library of Congress

SUMMARY: This paper discusses changes that have been proposed by the participants in the European CoBRA FLEX Project 10164 for the file label that is used for files of USMARC records transferred via the File Transfer Protocol (FTP). Additional fields are proposed that have been deemed necessary for exchange of records in a variety of MARC formats.

KEYWORDS: FTP Label; File Transfer

RELATED: DP 61 (Jan. 1993); 93-9 (June 1993)

FOR STATUS SEE DOCUMENT: DP94.COV


DISCUSSION PAPER NO. 94: Changes to FTP File Label Specifications

1. INTRODUCTION

The European library community has been investigating the use of the Internet File Transfer Protocol (FTP) for the electronic exchange of bibliographic data. The European Commission's Libraries Programme through CoBRA (Computerized Bibliographic Record Actions) has funded the FLEX (File Label EXchange) Project 10164 to investigate the need for standards in this area, and "to suggest a suitable file labelling and naming format".

The participants in the FLEX Project understand that without standardization in the way files are described within the label file, it would become increasingly difficult to exchange bibliographic information internationally. Because the USMARC specification for electronic file transfer has been widely reviewed by the USMARC community and is now in use by many exchange partners of bibliographic records, the FLEX project participants have proposed that the USMARC specification be used as the base specification. However, they have proposed some enhancements to that specification to take into account a European dimension for exchanging and processing bibliographic data.

In addition, the FLEX Project participants have suggested a file naming convention for use when certain operating system constraints apply.

2. PROPOSED ENHANCEMENTS TO THE CURRENT SPECIFICATION

[In the tables below "M/O" will be the abbreviation used for "mandatory/optional"; "F/V" will be the abbreviation used for "Fixed length/Variable length"; "R/NR" will be the abbreviation used for "Repeatable/Not Repeatable".]

Arising from the consultation process and a workshop held on October 24, 1995, at the National Computing Centre offices in London, the following enhancements to the base file label specification have been proposed.

Proposal 1. Label Character Set

It is proposed that the character set of the label file conform to ISO 646 (and that this be specified). ISO 646 is equivalent to ASCII when properly specified.

Proposal 2. End-of-field Character

It is proposed that the current end-of-file marker (X'1E') be replaced with one that any operating system can easily supply. The "New line" character is the suggested replacement.

Proposal 3. Enhancement to the ORS Field (Originating System ID)

In order to ensure that field ORS content is unique, it is proposed that current field content be preceded by a country identifier followed by a space. The country identifier would be the two- character alpha code defined by ISO 3166 (Codes for the Representation of Names of Countries). It is proposed that inclusion of the country identifier be optional.

Proposal 4. Enhancement to the FOR Field (Format)

It is proposed that the existing field FOR (Format) be made mandatory to identify the structural format standard used for records in the file. For example, "M" = Z39.2 (or its equivalent ISO 2709), and "S" = SGML.

Proposal 5. Specification of a New Field FQF (Format Qualifier)

Field FOR (Format) is insufficient in itself to completely describe the format of the record file, e.g., for identifying a particular tag set/specification for Z39.2 records or a particular DTD for SGML records. Therefore, a new label field FQF (Format Qualifier) is proposed with the following attributes:

Tag Element Name Description M/O F/V R/NR
FQF Format Qualifier alphanumeric O V NR

The field would be used in conjunction with field FOR (Format) but its use would remain optional. It is proposed that field FQF follow immediately after field FOR in field sequence. The content of the FQF field would be taken from a list of formats and DTDs.

Proposal 6. Specification of a New Field CSI (Character Set Initial)

To enable the processing of variations in character set, it is proposed that two new label fields are specified. The first is field CSI (Character Set Initial) which would specify the initial character set / graphic set needed for processing the record data file. The field content would equate to a particular international standard character set, e.g. basic Latin (ISO 646), extended Latin (ISO 5426 - 1980), Greek (ISO 5428 - 1980), USMARC set, or a coded reference to a private character set. The proposed field would be represented as follows:

Tag Element Name Description M/O F/V R/NR
CSI Character Set Initial alphanumeric M V NR

If the field content represents a private character set field NOT (Notes) can contain for further information on processing requirements and/or field REP (Reply To) can conatin a person to contact. It is proposed that this field be Mandatory.

Proposal 7. Specification of a New Field CSE (Character Set Extension)

It is proposed that an additional field be used to provide information on character set variations and extensions. The proposed field would be represented as follows:

Tag Element Name Description M/O F/V R/NR
CSE Character Set Extension alphanumeric O V R

This field would be used in conjunction with the field CSI and would contain a particular ISO 2022 escape sequence(s) or a textual description. Although specified as an optional element, it is proposed that if extensions are being used then the field would be present. It is further proposed that the CSI and CSE fields follow the DES field in field sequence.

Proposal 8. Specification of a New Field CID (Customer ID)

To assist those organizations that exchange bibliographic information with a large recipient community in identifying the intended customer, it is proposed that a new field be specified to identify the customer. The proposed field would be represented as follows:

Tag Element Name Description M/O F/V R/NR
CID Customer ID alphanumeric O V NR

The field would be used to contain the name or identifier of the end customer or recipient database. An example requiring this method of identification would be a PUT transfer to a central customer point, when additional information is required by the customer to determine the final destination for the records. It is proposed that this field follows field ISS (Issue).

3. SUMMARY OF THE PROPOSED LABEL SEQUENCE

Below is a summary of the enhanced file label specification with changes indicated. (Information between [] proposed for deletion; between <> proposed for addition.)

Tag Element Name Description M/O F/V R/NR
DAT Date Compiled yyyymmddhhmmss.f M F NR
RBF Number of Records numeric M V NR
DSN Data Set Name alphanumeric M V NR
ORS Orig. System ID alphanumeric M V NR
DTS Date Sent yyyymmddhhmmss.f O F NR
DTR Dates of Records yyyymmddyyyymmdd O F NR
FOR Format alphanumeric [O] F NR
Format Qualifier alphanumeric O V N
DES Description alphanumeric O V R
Charac. Set Initial alphanumeric M V N
Charac. Set Exten. alphanumeric O V R
VOL Volume alphanumeric O V R
ISS Issue alphanumeric O V R
Customer ID alphanumeric O V N
REP Reply to alphanumeric O V R
NOT Note alphanumeric O V R

4. FILE NAMES FOR ELECTRONIC FILE EXCHANGE

As there is wide variation in local file naming conventions and due to difficulties presented by application and operating system software, it is proposed that the content of the file name be left to the exchange partners to agree on an appropriate format. This has the benefits of:

- Exchange partners that require long file names to adequately describe the product being transferred will not be constrained by a file naming convention that would need to be designed to meet the limitations imposed by short file names i.e. eight characters and a three character extension.

- It would not be necessary to change existing application software, e.g. download and upload routines.

However, it is proposed that where operating system constraints mandate filenames that contain eight characters (with a three- character extension), and the exchange partners feel that a naming convention would be desirable, a file extension convention is suggested. The following file extensions (uppercase) are suggested to differentiate between the label file and the data file.

The label file will take the extension .TXT
The record file will take the extension .DAT

5. FTP LABEL EXAMPLE

DAT##19951221211236.0
RBF##1564
DSN##LOC.BOOKS.DIST.DATA.D951221
ORS##US DLC
DTS##19951222013000.0
DTR##1995122119951221
FOR##M
FQF##USMARC
DES##MUMS Books Daily DQ
CSI##USMARC
CSE##USMARC--Hebrew
VOL##V21
ISS##I50
CID##Middle East Library--RS6
REP##NDMSO@LOC.GOV
NOT##Test set of Hebrew records

6. QUESTIONS FOR FURTHER DISCUSSION

1. Are these enhancements to the FTP file label useful to the North American community?

2. Should the current end-of-field character be replaced, or should more flexibility be allowed with regard to label field termination?

3. Is there a requirement for variability in the label character set?

4. Is the two-part format specification useful? It allows the U.S. to use only FOR as it has done, but to be clearly understood internationally, FQF needs to also be used.

5. Comments on character set specification fields?

6. Are there North American uses for the CID field?

.