Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

OCR ALTO file names will have same prefix as their corresponding image file names, and with .xml file extension.

MARCXML files 2

...

MARCXML

...

MARCXML file names consist of two components: [HOLLIS_ID].xml

...

    [BATCH ID] (see note 4)
        |-- [UNIQUE_ID]-mets.xml (single volume METS file example: 007984492-mets.xml)
        |-- [UNIQUE_ID]_[VOLUME_ID]-mets.xml (multivolume METS file example: 000652831_v0002-mets.xml)
        |-- [UNIQUE_ID]-mets.xml (manuscript collection METS file example: morgan_601_705_volIV-mets.xml)
        |-- [HOLLIS_ID].xml(see note 6) (MARCXML or MODS xml file, e.g., 000652831.xml)
| |-- [UNIQUE_ID(see note 7)]/ (manuscript collection example, e.g. morgan_601_705_volIV) | |-- [UNIQUE_ID]_[####].jp2 |-- morgan_601_705_volIV_0001.jp2 |-- morgan_601_705_volIV_0002.jp2 |-- morgan_601_705_volIV_0003.jp2 |-- morgan_601_705_volIV_0004.jp2 ... |-- morgan_601_705_volIV_0099.jp2
| |-- [HOLLIS_ID]/(single volume monograph example, e.g. 007984492) | |-- [HOLLIS_ID]_[####].jp2
|-- [HOLLIS_ID]_[####].txt
|-- [HOLLIS_ID]_[####].xml |-- 007984492_0001.jp2
|-- 007984492_0001.txt
|-- 007984492_0001.xml |-- 0079984492_0002.jp2 ... |-- 007984492_0099.jp2
|-- 007984492_0099.txt
|-- 007984492_0099.xml
| |-- [UNIQUE_ID]_[VOLUME_ID]/(see note 8) (multi-volume example, e.g. 000652831_v0002) | |-- [VOLUME_ID]_[v####]_[####].jp2 |-- 000652831_v0002_0001.jp2 |-- 000652831_v0002_0002.jp2 |-- 000652831_v0002_0003.jp2 ... |-- 000652831_v0002_0099.jp2

...

For photographs and other art objects, we group files into batches.  For example, the following batch contains a set of JPEG2000 files.

[BATCH ID] (see note 43)

   |-- ss_123458.jp2

   |-- ss_458790.jp2

...

  • Hard disk, flash drive
    • The repositories can borrow the media from Imaging Services or pay for them
    • Recommended for large sets of data
  • Google shared drive
    • A google account from the data recipient needs to be provided to Imaging Services
  • MS shared directory
    • The data recipient's email address needs to be provided to Imaging Services
    • Suitable for small sets of data
  • Secure file transfer (https://filetransfer.harvard.edu)
    • The data recipient's email address needs to be provided to Imaging Services.
    • The data recipient outside Harvard University needs to set up a guest account.
    • Suitable for small sets of data which need encryption during file transfer.


1. In cases where the record identifier includes space, the spaces will be replaced by underscores.

2. MARCXML records are only available for items that have been cataloged in Harvard’s bibliographic database, HOLLIS.

43. Batch level identifiers are assigned to groups of titles prepared and submitted together for scanning. These named “batches” will be maintained from scanning all the way through deposit to Harvard's Digital Repository Service and transfer of data to project partners beyond the Harvard libraries. Inclusion of technical metadata is optional.

5. Inclusion of MARCXML files are optional.

6. The title's unique identifier is used as the directory name.

7. Individual volumes or fascicles will be labeled using a two- or three-digit sequence number (e.g., v001, v002, v099, v123).