Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

Table of Contents

Selecting an appropriate image file format &

...

specification 1

Source material characteristics / typepreferred image file formats (in preference order)preferred image capture resolutions (in preference order)

Machine printed black and white text documents

  • The documents include no meaningful color content. The documents do not include color graphics, or color or black and white reproductions of photographs.

  • The document text is, in all cases, highly legible. The text is black or very dark printed on white or a very light colored paper.

  • The texts do not include handwritten comments (e.g., pencil annotations).
  1. TIFF format, bitonal (black and white); Group 4 compression. One image file per page-image.

  2. JPEG 2000 Part 1, Core Coding, Lossless Compression format or JPEG 2000 Part 1, Core Coding, Lossy Compression format, 8-bits per RGB channel. One image file per page-image.
  • 600 pixels per inch (ppi), relative to the size of the physical item depicted.
  • 400 ppi
  • 300 ppi

Printed or handwritten documents with color content

  • Items with meaningful color content, or printed continuous tone images
  • Discolored documents where the page background has darkened and the contrast between printed or handwritten information and page background is low, e.g.: documents that include light pencil annotations.
  1. Uncompressed, 8-bits per RGB channel TIFF images. One image file per item view.

  2. JPEG 2000 Part 1, Core Coding, Lossless Compression format, 8-bits per RGB channel. One image file per item view.

  3. Color JPEG images saved at a “high” or “highest” quality setting. One image per item view.

  • 400 ppi
  • 300 ppi

Works of art, color photographs

  • original works of art
  • printed reproductions
  • photographic films
  • lantern slides
  • glass plate negatives
  • color and toned monochromatic photographic prints
  • Uncompressed, 8-bits per RGB channel TIFF images. One image file per item view.

  • JPEG 2000 Part 1, Core Coding, Lossless Compression format, 8-bits per RGB channel. One image file per item view.

  • Color JPEG images saved at a “high” or “highest” quality setting. One image per item view.


preference orderReflective (e.g., prints)preference ordertransmissive (e.g., film)
1.600 ppi1.6000 pixels in long dimension
2.400 ppi2.4000 pixels or greater in long dimension
3.300 ppi



Monochromatic, black and white photographs, or continuous tone black and white images


preference orderReflective (e.g., prints)preference ordertransmissive (e.g., film)
1.600 ppi1.6000 pixels in long dimension
2.400 ppi2.4000 pixels or greater in long dimension
3.300 ppi



NOTE: ALL COLOR AND CONTINUOUS TONE BLACK & WHITE IMAGES should include embedded ICC display profiles, (e.g., sRGB, eciRGB, AdobeRGB, sGray).12


Optical Character Recognition (OCR) and keyed text files

For digital objects that include page-images and searchable text, the Harvard Digital Repository Service (DRS) requires deposits include one UTF-8 encoded plain text file for each corresponding page-image file.  The text file could be obtained from an OCR software or keyed.     Optionally, ALTO layout xml file for each image could also be included. 

For example, a 10 page document deposited to DRS would could include 10 image files, and identically named (except for the file suffix)  plain text files.

    ├── 013814337
        ├── 013814337_0001.txt
        ├── 013814337_0002.tif
        ├── 013814337_0002.txt
        ├── 013814337_0003.tif
        ├── 013814337_0003.txt
        ├── 013814337_0004.tif
        ├── 013814337_0004.txt
        ├── 013814337_0005.tif
        ├── 013814337_0005.txt
        ├── 013814337_0006.tif
        ├── 013814337_0006.txt
        ├── 013814337_0007.tif
        ├── 013814337_0007.txt
        ├── 013814337_0008.tif
        ├── 013814337_0008.txt
        ├── 013814337_0009.tif
        ├── 013814337_0009.txt
        ├── 013814337_0010.tif
        ├── 013814337_0010.txt

Naming and organizing files at time of scanning

Prior to or after scanning documents, one needs to decide how to organize the information so that it can be easily navigated in digital form. Documents have their own organizational structure (individual titles, volumes, issues, chapters, etc.). These meaningful structural components of the scanned documents need to be reflected in the organization of the sequentially numbered scanned page-images arranged within named directories.

...

Document directory should be named with the item ID (lowercase characters, no spaces)

    [002208174] ← this is a directory name: [ITEM_ID]
        |
        | ---- 002208174_0001.jpg
        | ---- 002208174_0002.jpg
        | ---- 002208174_0003.jpg
        | ---- 002208174_0004.jpg
        | ---- 002208174_0005.jpg
        | ...
        | ---- 002208174_0099.jpg
    [007984492]
        |
        | ---- 007984492_0001.jpg
        | ---- 007984492_0002.jpg
        | ---- 007984492_0003.jpg
        | ---- 007984492_0004.jpg
        | ---- 007984492_0005.jpg
        | ...
        | ---- 007984492_0099.jpg

Multi-volume example

[ITEM_ID] ← this is the parent directory for the title.
     |
     | ---- [ITEM_ID]_[VOLUME_ID] ← this is the directory for the volume.
       | ---- 000652831_v001_0001.jpg
       | ---- 000652831_v001_0002.jpg
       | ---- 000652831_v001_0003.jpg
       | ---- 000652831_v001_0004.jpg
       | ---- 000652831_v001_0005.jpg
       | ...
       | ---- 000652831_v001_0099.jpg
       |
     |---[000652831_v002]
     |
       | ---- 000652831_v002_0001.jpg
       | ---- 000652831_v002_0002.jpg
       | ---- 000652831_v002_0003.jpg
       | ---- 000652831_v002_0004.jpg
       | ---- 000652831_v002_0005.jpg
       | ...
       | ---- 000652831_v002_0099.jpg


...

  1. Anchor
    fadgi
    fadgi
    Technical Guidelines for Digitizing Cultural Heritage Materials
  2. Anchor
    fnote_embedded_icc
    fnote_embedded_icc
    Article:  What is embedded color profile information? AnchorfadgifadgiTechnical Guidelines for Digitizing Cultural Heritage Materials