Data extraction and analysis
Isham's patrons occasionally wish to extract data from library sources for digital humanities projects. This documentation covers how to extract data, as well as possibilities for studying data sets.
How to extract a finding aid with tags
While it is possible to download a version of a finding aid from HOLLIS for Archival Discovery, all of the fields and all of the tags are available only in ArchivesSpace. To download a finding aid for a patron, take the following steps.
- Locate the finding aid in ArchivesSpace.
- Click Edit.
- Click Export, then choose Download EAD. Tick every box.
- When you have a text document, save it as WordXML.
- Save the document in Notepad to scrub formatting.
- Put the document back into WordXML.
Analyzing a finding aid in OpenRefine
One tool that could be used for analyzing a finding aid's contents is OpenRefine. Here are the steps to follow.
- Download OpenRefine.
- Run OpenRefine in Chrome. Note: the program may attempt to open in Internet Explorer, but it can't actually function in that browser.
- Create a project by selecting your new WordXML document: click Choose Files, select the .xml file, click Next.
- Identify an item.
- Parse the cell text into numbers.
- Hit Create Project.
- Choose Export.
OpenRefine tips
- Click on drop-down menus in columns to choose text facets.
- Sort by count.
- Cluster and edit to show similar values; can merge from here.
Click on x choices to pull out a TSV with numbers, then save that as TSV. From there, you can open your project in Excel.
- Undo/Redo for version history.
- Split a column with ";" to separate out, for instance, bands in the Artie Freedman collection, then Reconcile. (See VIAF reconciliation service for more on this.)
- Note that OpenRefine is good for batch-editing, but not for value-by-value editing.
Wikidata
To construct SPARQL queries, you must first have your data mapped onto Wikidata. And even if your entities (names, places, works) exist in Wikidata, they need to be well-described. (If they don't exist, you'll need to create them.)
Related content
Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use