Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 50 Next »

Background

Most Harvard schools use the ProQuest ETD Administration tool to manage Electronic Thesis and Dissertation submission. After a school accepts and approves an ETD, the actual files created by the student need to be deposited into the DRS for preservation.

DRS Content Models and Packages from ProQuest

DRS uses a 'Content Model' approach to provide the best preservation support for a limited set of file types. Any type of file, however, can be deposited into DRS using a specific Content Model, called an Opaque Object.  Files in an Opaque Object are given bit level preservation, but there are limited delivery options.  The DRS Content Guide explains which file formats are appropriate for each supported Content Model.

ETDs from ProQuest can have a variety file types in each submission package. For a given ETD submission package, each file with a currently supported DRS Content Model will be deposited in a separate DRS Object and assigned a 'Role'. All files in a submission package with formats that are not currently supported by a DRS Content Models will be put into one Opaque Object.  The example in the table below illustrates how file types from a submission would be assigned to a Content Model.


Examples:

File typeContent Model
PDFDocument
text/xmlText
JP2, JPEGStill Image
mp3Audio
MS Word DocumentDocument
CAD, PowerPoint fileOpaque


For example, a ProQuest ETD could have the following files which would be deposited in the corresponding Content Model

  • Thesis in PDF format => Document
  • mets.xml file => Text
  • License files in PDF formats => Document
  • Supplementary files, such as thesis appendices, data sets, videos, etc,. in a variety of formats => Opaque

DRS Roles and Relationships

DRS objects and files can be assigned 'Roles' to help categorize material as well as facilitate relationships between DRS objects. The Adding Relationships section of the BatchBuilder Guide has information on DRS relationships.  The THESIS and THESIS_SUPPLEMENT Roles are used specifically for ETDs.

File to DRS Role mapping

File from ETDDRS Object RoleRelationship from THESIS object
Thesis PDFTHESIS-
Supplementary filesTHESIS_SUPPLEMENTHAS_SUPPLEMENT
License files

LICENSE

HAS_LICENSE (In Rights section of Thesis Object)
mets.xmlDOCUMENTATIONHAS_DOCUMENTATION 

DRS Batches, Objects, and Content Models in BatchBuilder

BatchBuilder is the main tool used to organize content for deposit into DRS.  A 'batch' is the group of files and directories that is sent to DRS for deposit. Each batch has at least one 'object' of some content model along with a descriptor file for each object that has technical, administrative, and preservation metadata for the object and any of its files.  In order to be deposited, each batch also needs a file, batch.xml, that has information about all the objects in the batch.  A batch can include objects of different content models, but more often than not, all the objects in a batch are all the same content model.

Information needed for each object

  • Object Owner Supplied Name (OSN) - Unique within the DRS Owner Code
  • File Owner Supplied Name (OSN) for each file in the object - doesn't need to be unique in DRS

Rules for OSNs and filenames are in the BatchBuilder User Guide


Rule for creating Object OSNs using variable definitions from the Alma MARCXML Template and fileSec information

ETD_[SCHOOL_CODE]_[DEGREE_DATE_VALUE]_PQ_[PROQUEST_IDENTIFIER_VALUE]_[USE_ROLE]


Rule for creating File OSNs using variable definitions from the Alma MARCXML Template and fileSec information

ETD_PQ_[PROQUEST_IDENTIFIER_VALUE]_[USE_ROLE]


Key:

SCHOOL_CODE - dropbox: gsas,dce, college, 

DEGREE_DATE_VALUE - thesis.degree qualifier=”date”

PROQUEST_IDENTIFIER_VALUE - dc.identifier qualifier=”other”

USE_ROLE - Thesis or Supplement[1, 2. 3..] or License[1,2,3…] or Documentation[1,2,3,..] (outlined below)

Gathering information to create ETD Objects

In order to create objects for the files in an ETD submission, we to get the following values for each file:

  • Filename
  • Mime-type
  • USE type
  • ADMID

The information for all the files comes from the 'fileSec' part of the mets.xml file, except for the mets.xml file itself.

Example of files in an ETD Submission directory


FileSec of mets.xml
  <fileSec>
    <fileGrp ID="etdadmin-mets-fgrp-1" USE="CONTENT">
      <file GROUPID="etdadmin-mets-file-group" ID="etdadmin-mets-file-2132021" MIMETYPE="application/pdf" ADMID="amd_primary" SEQ="1">
        <FLocat LOCTYPE="URL" xlink:href="thesis_pdfa_allisonhyatt.pdf"/>
      </file>
      <file GROUPID="etdadmin-mets-file-group" ID="etdadmin-mets-file-2132069" MIMETYPE="application/pdf" ADMID="amd_supplemental_1" SEQ="1">
        <FLocat LOCTYPE="URL" xlink:href="appendices_pdfa_allisonhyatt.pdf"/>
      </file>
    </fileGrp>
    <fileGrp ID="etdadmin-mets-fgrp-2" USE="LICENSE">
      <file GROUPID="etdadmin-mets-file-group" ID="etdadmin-mets-file-2046147" MIMETYPE="application/pdf" ADMID="amd_license_2046147">
        <FLocat LOCTYPE="URL" xlink:href="setup_2E592954-F85C-11EA-ABB1-E61AE629DA94.pdf"/>
      </file>
    </fileGrp>
  </fileSec>


Rule for Object ROLE assignments

USE TypeAMDIDObject ROLE

CONTENT

amd_primaryTHESIS
CONTENTamd_supplemental_[/d]THESIS_SUPPLEMENT
LICENSEamd_license_[/d+]LICENSE
N/Amets.xmlDOCUMENTATION



Values assigned to each file in the submission directory

FilenameMime-typeAMDIDUSE TypeObject ROLE
thesis_pdfa_allisonhyatt.pdfapplication/pdfamd_primaryCONTENTTHESIS
appendices_pdfa_allisonhyatt.pdfapplication/pdfamd_supplemental_1CONTENTTHESIS_SUPPLEMENT
setup_2E592954-F85C-11EA-ABB1-E61AE629DA94.pdfapplication/pdfamd_license_2046147LICENSELICENSE
mets.xmltext/xmlN/AN/ADOCUMENTATION

Building the objects

Create a Content Model specific object for each file that matches a supported Content Model, and one Opaque Object for all the non-supported file formats.

  • Content Model based on file Mime-type
  • Role based on Use category and AMDID
    • Primary PDF thesis gets ROLE=THESIS
    • Objects for files in CONTENT group get ROLE=THESIS_SUPPLEMENT
    • Objects for files in ‘LICENSE’ group get ROLE=LICENSE
    • Object that has the mets.xml file gets ROLE=DOCUMENTATION
  • Object with ROLE=THESIS gets:
    • MODS descriptive metadata from Alma using MMSID
    • Has_supplement relationship to all objects with THESIS_SUPPLEMENT Roles
    • Has_licence relationship to all objects with LICENSE Role
    • Has _documentation relationship to all objects with DOCUMENTATION Roles
    • HOLLIS link based on MMSID in Harvard Metadata link
    • DASH link based on DASH ID in Harvard Metadata link






  • No labels