...
Image File format: TIFF / 8-bit grayscale / uncompressed / embedded sGray ICC profile / 300 -- 400 ppi relative to the size of the analog content represented on film.
OCR File formats:
- ALTO, UTF-8 encoded
- Plain text, , UTF-8 encoded
Directory- and file-naming
directories:
[document id]
- image files:
[document id]_####.tif
- ocr files:
- ALTO
[document id]_####.xml
- Plain text
[document id]_####.txt
- ALTO
- pdf file:
[document id]
.pdf
Directory / file structure
Example: └── 990009660700203941 ├── 990009660700203941_0001.tif ├── 990009660700203941_0001.xml ├── 990009660700203941_0002.tif ├── 990009660700203941_0002.xml ├── 990009660700203941_0003.tif ├── 990009660700203941_0003.xml ├── 990009660700203941_0004.tif ├── 990009660700203941_0004.xml ├── 990009660700203941_0005.tif ├── 990009660700203941_0005.xml ├── [...] ├── 990009660700203941_9999.tif ├── 990009660700203941_9999.xml
|__ 990009660700203941.pdf
Returning data and films to Harvard
...