Creating PDS Object Batches


Overview

PDS Document objects are page-turned objects that can be delivered by the DRS Page Delivery Service (PDS). Each object can be composed of multiple files, each representing either a page image, a page text or (optionally) the page text together with the layout (ALTO files).

All PDS Document objects automatically get assigned public delivery service URNs, which resolve to the Page Delivery Service (PDS) URLs. By default, individual files within a PDS Document object do not automatically get assigned public delivery service URNs. IDS delivery URNs may be requested for page images with role "deliverable" by checking this option in object/object template optional metadata panel. Page-level and section-level PDS URNs may be requested after deposit using the Web Admin structure editor.

Accepted Formats

PDS Document objects can contain files in the following formats:

Page Image

  • JPEG 2000 JP2 image files (file extension: jp2)
  • JPEG File Interchange Format image files (file extension: jpg)
  • Graphic Interchange Format (GIF) image files (file extension: gif)
  • Tagged Image File Format (TIFF) image files (file extension: tif or tiff)

Page Text

  • Plain UTF-8 encoded text (file extension: txt)

Page Layout

  • Extensible Markup Language (XML) Files (file extension: xml)

The file extensions noted above are mandatory.

File Name Rules

This topic describes specific file name requirements for page-turned object batches. See Section 3 Naming Rules for Objects, Files and Directories for general Batch Builder requirements.

The file-naming scheme for page image, text and page layout files in page-turned object is:

     {fileBaseName}{separator}{sequenceNumber}.{extension}

where:

{fileBaseName} is usually a locally-meaningful name, for example a name that associates the digital file with an analog counterpart. This may be an accession number, an Aleph or Alma ID, or other curatorially-significant name. Valid characters to use for the {fileBaseName} are letters, digits, '.', underscores ('_'), and hyphens ('-' ). 

{separator}  is a double underscore ('_ _') used to separate the {fileBaseName} from the page sequence number. 

{sequenceNumber} is the numeric value that represents the sequence number of the page within the page-turned document. A sequenceNumber can be composed of any of the following characters: 0123456789. 

The sequence number can include leading zeros, for example the third page can be written as: 3 or 03 or 000000003. The page sequence number indicates a page's relative position within a sequence of pages, regardless of the numbering that may appear on the page. 

.{extension} is one of the valid file extensions at the beginning of this topic.

By default Batch Builder assumes that fileBaseName plus the separator and the page sequence number is the file OwnerSuppliedName. E.g.: in a file 12345_ 01.tif Batch Builder will assume that 12345 _01 is the file OwnerSuppliedName. This default can be changed in View-Options menu. See Section 1.2 Setting Options.

Alternatively, instead of using the file naming scheme the file OwnerSuppliedName, the page sequence number and the object OwnerSuppliedName can be supplied in a special mapping file called "mapping.txt". More about mapping files…

On this page:

In this section: