Word Processing Formats
Harvard Library collections include documents and other “office-like” material that will be preserved in the Library’s preservation and access repository - the Digital Repository Service (DRS). As a first step towards providing support for this material in the DRS, the Library contracted Paul Wheatley Consulting Ltd. in late 2014 to assist with the analysis. The goals of the analysis were:
- Recommended word prcessing formats to accept and prefer for the DRS
- Recommended technical metadata schema(s) to use for files in word processing formats
- DRS content models for these objects
- Recommendations for enhancing Harvard Library’s FITS tool to better support these objects
The driving principles of this work were to:
- Provide interoperability with the existing metadata schemas and workflows of the DRS
- Provide sufficient metadata for long-term preservation of word processing objects
- Adhere to existing standards where possible
- Propose simpler models over more complex ones where possible
Specifically the analysis was conducted in three areas: formats, metadata and tools. After the analysis conducted by Paul Wheatley, Harvard Library conducted additional tool testing. The deliverables from this analysis are included on this page.
Format Analysis
Document | Description | Authors |
Format matrix - Word Processing Files | This spreadsheet compares word processing formats according to preservation criteria. The column headings are shaded according to the criteria importance of Harvard Library (red: very important, orange: somewhat important, yellow: somewhat unimportant, green: unimportant). The cells are shaded to visually represent the value for the format (green is good, yellow is neutral, red is bad) | Paul Wheatley |
Format profile - Apple iWork Pages v04.docx Format profile - ePUB v04.docx Format profile - Microsoft Office Binary Word Document v04.docx | These documents are brief profiles of the word processing formats that were under consideration for acceptance in the DRS. Along with descriptive information about the format, they include a summary of risks and potential strategies for mitigating the risks. | Paul Wheatley |
Metadata Analysis
Document | Description | Authors |
Metadata approach | This document explains the rational for recommending particular metadata fields to be captured for word processing documents for Harvard Library's preservation use case. | Paul Wheatley |
Metadata summary | This spreadsheet maps the recommended metadata fields to capture to whether it would be an addition to DocumentMD, the preservation rationale for capturing it, and Tika support for extracting it. | Paul Wheatley |
Tool Analysis and Testing
Document | Description | Authors |
Tool assessment | This document provides an overview of the tool assessment activities that were carried out, along with specific recommendations on how to proceed. | Paul Wheatley |
Other Resources
Document | Description | Authors |
Preservation approach | This document provides a summary of risks, a categorization of preservation formats, thoughts on preservation strategy, and recommendations for further research and testing. | Paul Wheatley |
Related content
Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use