The Harvard Film Archive has many years worth of recorded audio of events with visiting artists at our Cinematheque.
We need complete transcripts for these to ensure ease-of-use for all researchers and patrons, so we are looking for help in the editing of AI-generated transcript files. Recordings are usually 20-60 minutes long, including introductions plus post-film discussions with audience participation, and consist of lively discussions between our curators and the filmmakers who visit the HFA Cinematheque from around the world.
We are starting with the creation of transcripts for a small set of audio files to test the workflow, and we need your help!
The plan is simple.
- Once you sign up on the Harvard Training Portal, you will be given access to edit the Project Sheet where you can mark your interest by placing your name and email next to one visiting artist event you would like to work on.
- If a name already appears in the “Transcriber Name” field, that set of files is in the process of being assigned to a transcriber.
- Please note that specific language skills are recommended for some events but are not required.
- This sheet will continue to be updated as the project lead readies additional files, so if there is nothing available when you view it please keep checking!
- If a name already appears in the “Transcriber Name” field, that set of files is in the process of being assigned to a transcriber.
- Within 24 hours of sign-up, the project lead will give you edit access to the AI-generated transcript.
- Read the Formatting Rules (below) closely before you begin for guidance on your work.
- Start work on your file on your own time. If you have questions, please reach out directly to the project lead via email.
When you are finished with your files, change the status for your file to “ready for review” in the Project Sheet. The project lead will reach out to you if they have any questions about your work
- If you enjoyed your first project, sign up for another set of files!
Guidelines and Resources
Formatting
You are starting with an AI-generated transcript created directly from the audio file. It will contain numerous mistakes, so correcting the text is the most important thing for you to help us with.
The final edited transcript should follow the audio exactly, including umms, ahhs, mistakes, restarts, etc. We will eventually timestamp these documents for captioning so an exact transcript at this point is more important than a “clean” version of the document.
A sample transcript for review can be found here. |
---|
Formatting Rules
- Flags
- Use [INAUDIBLE] to denote words or section that can not be heard
- Use [?flagged word?] to denote words or proper nouns that can not be confirmed
- Use [UNKNOWN] to denote a words or sections that you can hear but have no guess as to what is being said
- Speaker Identification
- Use names if known (full names without titles) such as HADEN GUEST, DAVID PENDLETON, AGNES VARDA.
- Label unidentified speakers as SPEAKER 1:, SPEAKER 2:, etc.
- All audience members will be labeled as AUDIENCE:, even if name is given.
- Use names if known (full names without titles) such as HADEN GUEST, DAVID PENDLETON, AGNES VARDA.
- Enhanced Audio Description
- Audio descriptions such as [APPLAUSE] and [LAUGHTER] should be included whenever possible
- Timestamps
- Please leave timestamps as-is in your transcript document
- Please leave timestamps as-is in your transcript document
Research
In most cases you will be streaming the audio directly from the event page on the HFA website. Here you can find the correct spelling of the visiting artist’s name, the titles of their films, and read a bit about the filmmaker before you begin.
Additionally, doing a bit of background research on the filmmaker and their work (on imdb, wikipedia, through google and/or journal searches) before you start is recommended but not necessary. It can help to establish context for identifying names, places, etc. in the transcript.
If all else fails, follow the guidelines above to mark inaudible or unknown words and phrases or reach out the project lead for assistance! |
---|