Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Data Organization and Data Transfer Guide

(This is how Imaging Services formats and organizes data to share with project partners.)

...

will work with Harvard libraries, museums, and archives to share digitized content (digital images and associated metadata

...

) with external partners (e.g., funders, commercial publishers, other cultural heritage organizations)

...

.

However, Imaging Services will not:

  • Produce images or associated metadata files in formats, or to special format-encodings or technical specifications required by external partners when these specification differ standard workflows developed to make digital content that is to be stored by Harvard Library's Digital Repository Service.
  • Responsibilities for transferring data, converting data, and the provision of any supplemental data need to satisfy external partner requirements are to be borne by the owning repository that has contracted with Imaging Services.

Description of files

  • Image files are encoded in compliance with the JPEG2000 standard
    • Data compression: irreversible 9-7 wavelet transform for lossy compression, or reversible 5-3 wavelet transform for lossless compression.
    • Photometric interpretation:
      • RGB (color), 8-bits per channel, embedded sRGB ICC profile.
      • Grayscale, 8 or 16-bits per channel.
  • OCR plain text files (optional) are in UTF-8 encoding.
  • OCR alto files (optional) conform to the ALTO standard. 
  • Structural metadata files (optional) conform to the METS standard and Harvard's METS profile for page-turned objects.
  • MARC record files (optional) are in MarcXML or MODS format.
  • Packaging tag files (optional) generated by the packaging application (Bagit) describe the package.

...

Image file names consist of three one or four more components: [UNIQUE_ID].jp2, [UNIQUE_ID]_[Sequence_number].jp2 or , [UNIQUE_ID]_[Volume_number]_[Sequence_number].jp2, or [UNIQUE_ID]_[Volume_number]_[Sequence_number]_[Issue_number].jp2

  1. [UNIQUE_ID]: Usually this will be a catalog record ID1. For example, a 9-digit HOLLIS ID (e.g. 011835322) will be recommended for an item that has been cataloged in Harvard’s HOLLIS catalog system. In the case of manuscript collections, the library or archives repository will provided a unique identifier from HOLLIS for Archival Discovery or local tracking system.  In the case of photograph or other art objects, the library or archives repository will provide a unique identifier from JSTOR Forum or local tracking system.

  2. [Volume_number]: (multi-volume sets only) 1 or more digits zero padding volume sequence number with a prefix 'v' (e.g. , morgan_601_705_volIVv5, v05 or v0005).

  3. [VolumeIssue_number]: (multi-volume issue sets only) 3-digit or 4-digit volume 1 or more digits zero padding issue sequence number with a prefix 'vn' (e.g. v005 n2, n02 or v0005n002).
  4. [Sequence_number]: 4-digit or 5-digit page sequence number (e.g. 0045, 00045).  This will not apply to photograph or art object image files.
  5. .jp2: File format (JPEG 2000, part 1) extension. 

Examples

  • Single volume file name example: 010010723_0001.jp2

  • Multi-volume file name example: 008105127_v0007_0001.jp2

  • Multi-volume and multi-issue file name example: 008105127_v0007-METS.xml_n003_0001.jp2
  • Manuscript collection file name example: _morgan_601_705_volIV-METS.xml_0001.jp2

Note: Occasionally, a project may require a different file name pattern due to project partners' specific need.   For example, The Black Teacher Archive project need to name file as [Project_code]_[OCLC#]_[State_code]_[Year]_[Volume#]_[Issue#] (for example, bta_30786193_MA_1966_038_008.jp2).


MarcXML file names consist of two components: [HOLLIS_ID].xml2

...