Background
Most Harvard schools use the ProQuest ETD Administration tool to manage Electronic Thesis and Dissertation submission. After a school accepts and approves an ETD, the actual files created by the student need to be deposited into the DRS for preservation.
DRS Content Models and Packages from ProQuest
DRS uses a 'Content Model' approach to provide the best preservation support for a limited set of file types. Any type of file, however, can be deposited into DRS using a specific Content Model, called an Opaque Object. Files in an Opaque Object are given bit level preservation, but there are limited delivery options. The DRS Content Guide explains which file formats are appropriate for each supported Content Model.
ETDs from ProQuest can have a variety file types in each submission package. For a given ETD submission package, each file with a currently supported DRS Content Model will be deposited in a separate DRS Object and assigned a 'Role'. All files in a submission package with formats that are not currently supported by a DRS Content Models will be put into one Opaque Object. The example in the table below illustrates how file types from a submission would be assigned to a Content Model.
Examples:
File type | Content Model |
---|---|
Document | |
text/xml | Text |
JP2, JPEG | Still Image |
mp3 | Audio |
MS Word Document | Document |
CAD, PowerPoint file | Opaque |
For example, a ProQuest ETD could have the following files which would be deposited in the corresponding Content Model
- Thesis in PDF format => Document
- mets.xml file => Text
- License files in PDF formats => Document
- Supplementary files, such as thesis appendices, data sets, videos, etc,. in a variety of formats => Opaque
DRS Roles and Relationships
DRS objects and files can be assigned 'Roles' to help categorize material as well as facilitate relationships between DRS objects. The Adding Relationships section of the BatchBuilder Guide has information on DRS relationships. The THESIS and THESIS_SUPPLEMENT Roles are used specifically for ETDs.
File to DRS Role mapping
File from ETD | DRS Object Role | Relationship from THESIS object |
---|---|---|
Thesis PDF | THESIS | - |
Supplementary files | THESIS_SUPPLEMENT | HAS_SUPPLEMENT |
License files | LICENSE | HAS_LICENSE (In Rights section of Thesis Object) |
mets.xml | DOCUMENTATION | HAS_DOCUMENTATION |
DRS Batches, Objects, and Content Models in BatchBuilder
BatchBuilder is the main tool used to organize content for deposit into DRS. A 'batch' is the group of files and directories that is sent to DRS for deposit. Each batch has at least one 'object' of some content model along with a descriptor file for each object that has technical, administrative, and preservation metadata for the object and any of its files. In order to be deposited, each batch also needs a file, batch.xml, that has information about all the objects in the batch. A batch can include objects of different content models, but more often than not, all the objects in a batch are all the same content model.
Preparing a batch
Examine the 'fileSec' part of the mets.xml file to identify the files in the ETD. Each file will be inside a '<file>' field.
<fileSec> <fileGrp ID="etdadmin-mets-fgrp-1" USE="CONTENT"> <file GROUPID="etdadmin-mets-file-group" ID="etdadmin-mets-file-2132021" MIMETYPE="application/pdf" ADMID="amd_primary" SEQ="1"> <FLocat LOCTYPE="URL" xlink:href="thesis_pdfa_allisonhyatt.pdf"/> </file> <file GROUPID="etdadmin-mets-file-group" ID="etdadmin-mets-file-2132069" MIMETYPE="application/pdf" ADMID="amd_supplemental_1" SEQ="1"> <FLocat LOCTYPE="URL" xlink:href="appendices_pdfa_allisonhyatt.pdf"/> </file> </fileGrp> <fileGrp ID="etdadmin-mets-fgrp-2" USE="LICENSE"> <file GROUPID="etdadmin-mets-file-group" ID="etdadmin-mets-file-2046147" MIMETYPE="application/pdf" ADMID="amd_license_2046147"> <FLocat LOCTYPE="URL" xlink:href="setup_2E592954-F85C-11EA-ABB1-E61AE629DA94.pdf"/> </file> </fileGrp> </fileSec>
Variables extracted from fileSec above:
Variable | Source | Example | What's it for |
---|---|---|---|
Filename | FLocat xlink:href | thesis_pdfa_allisonhyatt.pdf | Identifies file from directory that needs to be deposited |
Mime-type | file MIMETYPE | application/pdf | Determines which Content Model will be needed |
USE type | fileGrp USE | CONTENT | Part of ROLE determination |
ADMID | file ADMID | amd_primary | Part of ROLE determination |
- Create Object OSNs based on ProQuest ID, school code and Role
- Create File OSNs based on ProQuest ID and Role
- Create mapping.txt file to associate files with appropriate Object OSN and File OSN
- Get DASH URN from MARCXML if present
- Get Alma MMSID by using query with ProQuestETD ID
Create an object for each file
- Content Model based on file Mime-type
- Role based on Use category and AMDID
- Primary PDF thesis gets ROLE=THESIS
- Other files in CONTENT group get ROLE=THESIS_SUPPLEMENT
- Files in ‘LICENSE’ group get ROLE=LICENSE
- Mets.xml file gets ROLE=DOCUMENTATION
- File with ROLE=THESIS gets MODS descriptive metadata
Add relationships and Harvard Metadata inks to THESIS Object
- Has_supplement
- Has_licence
- Has _documentation
- HOLLIS link based on MMSID
- DASH link based on DASH ID
Files in ETD Submission directory
Examples of values assigned to a filename
Filename | Mime-type | AMDID | USE Type |
---|---|---|---|
thesis_pdfa_allisonhyatt.pdf | application/pdf | amd_primary | CONTENT |
appendices_pdfa_allisonhyatt.pdf | application/pdf | amd_supplemental_1 | CONTENT |
setup_2E592954-F85C-11EA-ABB1-E61AE629DA94.pdf | application/pdf | amd_license_2046147 | LICENSE |
ROLE assignments
USE Type | AMDID | ROLE |
---|---|---|
CONTENT | amd_primary | THESIS |
CONTENT | amd_supplemental_[/d] | THESIS_SUPPLEMENT |
LICENSE | amd_license_[/d+] | LICENSE |
N/A | mets.xml | DOCUMENTATION |