ASpace API workshop
- Julie Wetherill
- Dave Mayo
Welcome to the ASpace API Workshop wiki page.
Workshop sessions
Session 1 (Intro, Setup, Python practice) Tue. 6/9, 10-12noon
Presenter's notebook: .ipynb | .pdf | .html
Session 2 (Working with the ASpace API, part 1) Wed. 6/17, 10-12noon
Presenter's notebook: .ipynb | .pdf | .html
Session #2 recording (1:49:00)
Session 3 (Working with the ASpace API, part 2) Wed. 6/24, 1-3pm
Presenter's notebook: .ipynb | .pdf | .html
Session #3 recording (01:39:27)
Session 4 (API clinic: open discussion & picking of Dave’s brain) Tue. 6/30, 10-12noon
Presenter's notebook: .ipynb | .pdf | .html
Session #4 recording (01:46:15)
Text Blocks for Session 2
ASnake Initialization Code
# code here MUST be run before subsequent examples will work # here we are importing the ASnakeClient class from the asnake.client module from asnake.client import ASnakeClient # here we are creating our ASnake client, which we will use to make API requests client = ASnakeClient()
Software Agent Template
{ "jsonmodel_type": "agent_software", "names": [ { "jsonmodel_type": "name_software", "software_name": "Dave's Script", "sort_name": "Dave's Script", "version": "1.0", "isDisplayName": True, "rules": "local" } ], "title": "Dave's Script v1.0" }
Pre-work for session #2: Intro to the API and ArchivesSnake video
Below are 2 different recordings of Dave Mayo's presentation Introduction to the ArchivesSpace API and ArchivesSnake, given at the ArchivesSpace Online Forum in 2020. If you can, review this video before session #2. If you want just the presentation, select the MP4 option; if you also want the Q&A session that followed the presentation, chose the Youtube option.
44:10 (Presentation only; MP4)
59:32 (Presentation followed by Q&A; Youtube)
Pre-workshop software setup & video
See the google doc.
Command line basics
Action | Win | Mac |
---|---|---|
Open the console | Win+R > type cmd (or powershell) > Enter/OK | Finder > Applications > Utilities > Terminal |
List files and directories | dir | ls |
Create a directory | mkdir | mkdir |
Move to directory | cd | cd |
Go back to previous/parent directory | cd.. | cd .. |
Print current directory | cd | pwd |
Find home directory | echo %HOMEPATH% | echo ~ |
Command history | Up arrow | Up arrow |
Useful Python Libraries to look at next
argparse
https://docs.python.org/3/howto/argparse.html - tutorial (start here)
https://docs.python.org/3/library/argparse.html - full documentation
Handling command line arguments is a really important and valuable step toward improving the utility and reusability of your scripts. argparse is built into Python, and will:
- let you define command line arguments for your scripts, and have them converted into data your program can use
- automatically take these arguments (and attached description) to give your script a
--help
argument that will print a description of what your script does and how to use it
Here's a small example script:
#!/usr/bin/env python3 from argparse import ArgumentParser parser = ArgumentParser(description='Description of what my script does') parser.add_argument('input_csv', nargs='?', default='input.csv', help='CSV file to be read!') parser.add_argument('output_csv', nargs='?', default='report.csv', help='CSV file report gets written to!') args = parser.parse_args() print(args.input_csv) print(args.output_csv)
If you save this in a file called test_argparse.py, and run with the --help:
python3 test_argparse.py --help
You will get the following output:
usage: test_argparse.py [-h] [input_csv] [output_csv] Description of what my script does positional arguments: input_csv CSV file to be read! output_csv CSV file report gets written to! optional arguments: -h, --help show this help message and exit
openpyxl
https://openpyxl.readthedocs.io/en/stable/
There are several libraries in Python for working with Excel. This is the one I have found to be the most useful and least frustrating. In particular, it seems to have reliable access to the "raw" value input into cells, which has let me work around Excel date-handling issues.
sqlite3
https://pynative.com/python-sqlite/ - tutorial
https://docs.python.org/3/library/sqlite3.html - documentation
sqlite3 is an SQL database that is stored in a single file. If you're familiar with SQL, it provides a way to get some of the benefits of storing data in a database without needing a server set up. It's best thought of as an intermediate stage between spreadsheets and "full" databases; it's not as complex to work with, doesn't need any IT support, and you can do joins and queries.
Concepts in Python to self-study
iterator protocol - how do for loops work?
Python has something called "the iterator protocol" - which underlies how any object you can loop over works. It's very useful to understand how this works and is applied; in ASpace, a lot of scripting work involves iterating over search results or scripting.
An extremely sketchy overview:
- if a python object has the special __iter__() method defined on it, it is an "iterable"
- the __iter__() method should return an iterator, a python object with a special __next__() method
- the __next__() method can be called multiple times, and will return each item in the iterable until there are none left
- then, it raises a StopIteration exception
for loops internally use this! And it's how the part where we skipped header lines in the CSV example worked - the function next() takes an iterator and calls its __next__() method.
https://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Iterables.html - complete but somewhat dense tutorial
ASpace Sandbox information
SB staff mode:
https://arstaff-sb.lib.harvard.edu
SB PUI:
https://hollisarchives-sb.lib.harvard.edu
Base url for the SB API:
https://arstaff-sb.lib.harvard.edu:8443
See Sandbox Info google doc for more info.
Repository codes
2>ATK | 14>MED | 26>MCZ 27>MUS 28>ORC 29>TOZ 30>URI 31>WID 32>WOL 33>HSI 34>ORA 35>VIT 36>GUT |