Website Archiving Metadata

For web archiving/crawling instructions, see: Web Archiving with Archive-It

Archive-It Metadata

Metadata in Archive-It is add the seed level:

  1. Click the seed you want to edit from your respective Collection
  2. Click the "Metadata" tab
  3. Click "Edit" and fill out the below fields

When re-crawling a site, be sure to update the date range in the Archive–It metadata, and both the extent and the date range in the MARC/Holdings record. *

Title (title of the website)  Tips:

    • The title may be clear from the page itself, but if unclear, right click on the page, and select View Page Source. In the code, look under <head>, then look for something that appears to be title information ("siteName", <title>, etc.). 
    • You may supply a title prefix if needed for context, for example, if the supplied page name is "Corona Virus Response," but the site you're capturing is the Harvard School of Dental Health's Corona Virus page, you should supply a title like: Harvard School of Dental Health: Corona Virus Response.
    • Do replace any special characters with more standard characters, for example, replace "|" with ":". 

Creator

    • Generally follow bib cataloging approach - use most appropriately specific creator for the site/seed level you're capturing.
    • For example, the bib 110 for the NEPRC is: Harvard Medical School. New England Primate Research Center
    • Do not use terminal punctuation

Subjects (map from finding aid/MARC record)

    • Use LC headings - if LC doesn't give adequate topical coverage/specificity, supplement with MeSH headings
    • The creator (110) should be a subject
    • For high-level seeds, keep headings pretty basic/general

Description (similar MARC 520/545, shorten if needed) – Amber to add some guidance on this note

Publisher (if applicable)

    • Institutional websites: The President and Fellows of Harvard College.

Contributor Center for the History of Medicine (Francis A. Countway Library of Medicine)

Date (ideally active dates of website – first published through date of capture)

    • If you don't know the date the website was first published, use the date of first capture. For example, if you first crawled the website in 2019, but you crawl it on a yearly basis, the date should be entered as 2019-.

Identifier (Collection call number; for archives Record Group number and Series number)

    • Institutional website example: RG M-NE01, Series 00675

Source (for archives: department/office name; for manuscripts, collection full citation if part of a broader collection/not stand-alone)

Rights (standard rights statement)

    • The Harvard Medical Library does not hold copyright on all the materials in the collection. Requests for permission to publish material from the collection should be directed to Public Services (chm@hms.harvard.edu). Researchers who obtain permission to publish from Public Services are responsible for identifying and contacting the persons or organizations that hold copyright.

Related Archival Materials (Skip this - does not currently work)

(Custom field) SeedID (ID number can be found by clicking on the seed in Archive-It - it is the string of digits at the end of the URL in the browser bar)

    • This mainly serves as a way of having a direct link to the seed in the public-facing Archive-It.

Example:  https://archive-it.org/collections/4908?fc=meta_MMS%3A990131266760203941


See also: Creating a Seed Group and adding new seeds

Cataloging for Archived Websites

Apply our standard cataloging instructions, with the following archived website-specific modifications/additions:

Bibliographic record: 

    • 245 10 $$a [title of website, plus the qualifier "archived website" in square brackets. ]
      • Example: 245 10 $$a Harvard Medical School [archived website], $$f [Date/date range of capture] 
        • Note that the above title guidance only applies if the entire resource you're describing is an archived site. If the site is just one part of a larger manuscript collection/record group, you'll supply here the collection title (no qualifier needed).
      • If the entire resource you're describing is an archived site, for the inclusive dates, use the website's extant dates. If you don't know the site dates, use the Archive-It crawl date(s) as your 245 date.
    • 300 _ _  $$a 7.98 $$f gigabytes ($$a 1 $$f archived website).
      • If the collection you're cataloging includes both born-digital records on network storage and an archived website, enter those as two separate 300s. 
    • 655 _7 $$a Web archives. $$2 aat

Holdings record:

    • 852 8 _  $$b MED $$c NET $$h [Call number]
      • NOTE: you only need to create one holdings record for all accessions. In $h, add all of the accession numbers.
    • 856 40 $$3 Archived websites: [245 title, date/date range of capture] $$u [Archive-it url*]
      • Make sure to update the dates every time you add a new accessions to the 852 $h field.

** The url we use to point to Archive-it sites is generated by clicking on the Group Name in the public-facing Archive-It site.

Example catalog record: https://id.lib.harvard.edu/alma/99155742015203941/catalog


Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use