Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DRS uses a 'Content Model' approach to provide the best preservation support for a limited set of file types. Any type of file, however, can be deposited into DRS using a specific Content Model, called an Opaque Object.  Files in an Opaque Object are given bit level preservation, but there are limited delivery options.  The DRS Content Guide explains which file formats are appropriate for each supported Content Model.

ETDs from ProQuest can have a variety of file types in each submission package. For a given ETD submission package, each file with a currently supported DRS Content Model will be deposited in a separate DRS Object and assigned a 'Role'. All files in a submission package with formats that are not currently supported by a DRS Content Model will be put into one Opaque ObjectObjects.  The example in the table below illustrates how file types from a submission would be assigned to a Content Model.

...

Rule for creating Object and File OSNs using variable definitions from the Alma MARCXML Template and fileSec information

Object OSN: ETD_[OBJECT_ROLE]_[SCHOOL_CODE]_[DEGREE_DATE_VALUE]_PQ_[PROQUEST_IDENTIFIER_VALUE]

The Files OSN will be the same as the Object OSN with an additional 'sequence' number.  Only Opaque Objects will need File OSNs with sequence numbers higher than 1 since only Opaque objects can have more than one file.

File OSN: OBJECT_OSN_sequence_number

Example OSNs for an ETD from GSAS in 2022:

...

Rule for creating Harvard Metadata Link value for each DRS Object (Thesis and all supplementary objects)

Harvard Metadata link Type=Local: value=PQ-[PROQUEST_IDENTIFIER_VALUE]

Example:
PQ_12345678-12345678

Harvard Metadata link Type=Alma: value=[Alma MMSID]

Harvard Metadata link Type=DASH: value=[DASH ID]


More rules for OSNs and filenames are in the BatchBuilder User Guide

Gathering information to create ETD Objects

...

  • Content Model based on file Mime-type
  • Role based on Use category and AMDID
    • Primary PDF thesis gets ROLE=THESIS
    • Objects for files in CONTENT group get ROLE=THESIS_SUPPLEMENT
    • Objects for files in ‘LICENSE’ group get ROLE=LICENSE
    • Object that has the mets.xml file gets ROLE=DOCUMENTATION
  • Object with ROLE=THESIS gets:
    • MODS descriptive metadata from Alma using MMSID
    • Has_supplement relationship to all objects with THESIS_SUPPLEMENT Roles
    • Has_license relationship to all objects with LICENSE Role
    • Has _documentation relationship to all objects with DOCUMENTATION Roles
    • HOLLIS link based on MMSID in Harvard Metadata link
    • DASH link based on DASH ID in Harvard Metadata link
  • Object with any ROLE gets:
    • Harvard Metadata link of type 'Type=Local ' with label ProQuestID; value = PQ-[PROQUEST_IDENTIFIER_VALUE]
    • Harvard Metadata link Type=Alma - using MMSID
    • Harvard Metadata link Type=DASH - using DASH ID
  • File in any Content Model (except Text*):
    • Role=ARCHIVAL_MASTER

*Text Content Model doesn't support the ARCHIVAL_MASTER role for files

ETD deposits into DRS - data validation

  • Each batch should have the objects for only one ETD
  • Each batch should have one and only one Thesis, i.e. a Document Object with ROLE=Thesis
  • Each object in an ETD  batch should have Harvard Metadata Link values for the ProQuest ID (type=local) and Alma ID (type=alma)
  • If there is a DASH ID, it should be added to Harvard Metadata Links as a DASH type
  • The Thesis document object should have MODS metadata from HOLLIS using the Alma MMSID
  • The Thesis document object should have a defined relationship with all the other objects in the ETD batch
  • Embargo information should be recorded in the Thesis and Supplements
  • All the files listed in the fileSec of the mets.xml should match a file in the zip package
    • Filenames should be sanitized before creating  DRS batches
  • Each file in the zip package, except the mets.xml, should have an entry in the fileSec of the mets.xml
  • Each file in the zip package should be in one and only one object in the ETD batch
  • The ProQuest ID for any object in a new ETD batch should not be in a previously deposited DRS object.