HFA Visiting Artist Audio Transcription Project
The Harvard Film Archive has many years worth of recorded audio of events with visiting artists at our Cinematheque.
We need complete transcripts for these to ensure ease-of-use for all researchers and patrons, so we are looking for help in the editing of AI-generated transcript files. Recordings are usually 20-60 minutes long, including introductions plus post-film discussions with audience participation, and consist of lively discussions between our curators and the filmmakers who visit the HFA Cinematheque from around the world.
We are starting with the creation of transcripts for a small set of audio files to test the workflow, and we need your help!
The plan is simple.
- Once you sign up on the Harvard Training Portal, please email amy_sloper@harvard.edu and you will be given access to edit the Project Sheet where you can mark your interest by placing your name and email next to one visiting artist event you would like to work on.
- If a name already appears in the “Transcriber Name” field, that set of files is in the process of being assigned to a transcriber.
- Please note that specific language skills are recommended for some events but are not required.
- This sheet will continue to be updated as the project lead readies additional files, so if there is nothing available when you view it please keep checking!
- If a name already appears in the “Transcriber Name” field, that set of files is in the process of being assigned to a transcriber.
- Within 24 hours of sign-up, the project lead will give you edit access to the computer-generated transcript.
- Read the Formatting Rules (below) closely before you begin for guidance on your work.
- Start work on your file on your own time. If you have questions, please reach out directly to the project lead via email.
When you are finished with your files, change the status for your file to “ready for review” in the Project Sheet. The project lead will reach out to you if they have any questions about your work
- If you enjoyed your first project, sign up for another set of files!
Guidelines and Resources
Our biggest goal is to create a standalone, easy to read document that can be used for research. All formatting guidelines are toward this goal.
You are starting with a computer-generated transcript created directly from the audio file. It will contain numerous mistakes, so correcting the text is the most important thing for you to help us with.
The final edited transcript should follow the audio exactly, with the exception of removing umms, ahhs, mistakes, restarts, etc. in an effort to create a “clean” easily readable version of the document.
**Transcripts that include translators and audio in more than one language will have some separate guidelines, which are forthcoming**
A sample transcript for review can be found here. |
---|
Formatting Rules for Single Language Transcripts
- Capitalize the first words of sentences and use basic punctuation.
- Italicize the titles of films, books, plays, periodicals, databases, and websites
- Place titles in quotation marks if the source is part of a larger work. Television episodes, essays, chapters, poems, webpages, songs, and speeches are placed in quotation marks.
- Italicize foreign words used within English language sections. These include words that do not appear in Webster’s, etc.
- For example, do not italicize “kimono”, “futon”, or “honcho”.
- Do italicize words such as taiyozoku
- This does not apply to foreign words used within the titles of television episodes, essays, chapters, poems, webpages, songs, and speeches that are placed within quotation marks.
- Use italics to signal a speakers emphasis of a word or phrase. For example:
- "By and large discussion of Hollywood melodrama has revolved around women generally dismissed, originally as trashy romances, weepies, films directed to a female audience, which ultimately then offered an opportunity for applying feminist readings."
- "By and large discussion of Hollywood melodrama has revolved around women generally dismissed, originally as trashy romances, weepies, films directed to a female audience, which ultimately then offered an opportunity for applying feminist readings."
- Use of Flags
- Flags appear in place of a word, phrase, or section that you cannot hear, confirm, or understand
- Within a sentence, format flags with a space on either side of the bracket
- Use ALL-CAPS for all flag text inside of brackets
- Use [INAUDIBLE] to denote words or section that can not be heard
- Use [?FLAGGED WORD?] to denote words or proper nouns that can not be confirmed
- Use [UNKNOWN] to denote a words or sections that you can hear but have no guess as to what is being said
- Speaker Identification
- Use names if known (full names without titles) such as Haden Guest, David Pendleton, Agnes Varda
- Label unidentified speakers as they appear in the raw transcript: Unknown Speaker
- All audience members (present during the q/a sections) will be labeled as Audience, even if name is given
- Use names if known (full names without titles) such as Haden Guest, David Pendleton, Agnes Varda
- Enhanced Audio Description
- Audio descriptions such as [APPLAUSE] and [LAUGHTER] should be included whenever possible
- These should be put on their own line, with speaker names removed. For example:
Haden Guest 00:15
Good evening, ladies and gentlemen. My name is Haden Guest. I'm director of the Harvard Film Archive.
[APPLAUSE]
Haden Guest 00:50
I consider Kelly Reichardt to be one of the great American filmmakers, not just of today, but of all time.
- These should be put on their own line, with speaker names removed. For example:
- Provide significant information in all caps (and brackets) and avoid editorializing, for example:
- [THE SPEAKER TESTS THE MICROPHONE]
- [THE SPEAKER TESTS THE MICROPHONE]
- Audio descriptions such as [APPLAUSE] and [LAUGHTER] should be included whenever possible
- Timestamps
- Please leave timestamps that occur at the beginning of a speaker's section in your transcript document
- If timestamps occur in the middle of a speaker's segment or sentence, you can remove them. For example:
edit this from the computer-generated document: | into this in your edited transcript: |
---|---|
Unknown Speaker 1 8:45 So this film you're going to see with electronic subtitles I'm sorry, seems Unknown Speaker 8:55 there is Unknown Speaker 8:59 only print available and the other one is without English subtitles like make it almost precious. | Unknown Speaker 8:45 So this film you're going to see with electronic subtitles I'm sorry, seems there is only print available and the other one is without English subtitles like make it almost precious. |
Formatting Rules for Transcripts with Multiple Languages
This most often means an English translator is present during the discussion to translate the native speaker's sections.
- All rules listed for English language transcripts apply
- Clean up the English sections as you would with any transcript
- Italicize foreign words used within English language sections. These include words that do not appear in Webster’s, etc.
- For example, do not italicize “kimono”, “futon”, or “honcho”.
- Do italicize words such as taiyozoku
For any foreign language sections, remove the computer-generated text (which is going to be very poorly transcribed) and mark the section with the foreign language in brackets, for example:
edit this from the computer-generated document: into this in your edited transcript: Unknown Speaker 8:13
we
Unknown Speaker 8:16
know doesn't double remittances
Unknown Speaker 8:20
because you can tell is your sister Felicity
Unknown Speaker 8:24
feel merged imagine Nielsen era
Unknown Speaker 8:28
pauses
Unknown Speaker 8:30
So, when I start working on this movie the movie I was thinking to run it in Senegal
Alain Gomis 8:13
[Speaking in French]Translator 8:30
So, when I start working on this movie the movie I was thinking to run it in Senegal.
- In some cases, the translator is named. In the example above, the translator was not introduced. If the translator is named, you can format this way:
Alain Gomis 8:13
[Speaking in French]
Amy Sloper [translating]
So, when I start working on this movie the movie I was thinking to run it in Senegal.
- In the future, additional work on editing and cleanup may be done on the foreign language sections - we will try to assign this to a speaker/writer of the foreign language.
Research
In most cases you will be streaming the audio directly from the event page on the HFA website. Here you can find the correct spelling of the visiting artist’s name, the titles of their films, and read a bit about the filmmaker before you begin.
Additionally, doing a bit of background research on the filmmaker and their work (on imdb, wikipedia, through google and/or journal searches) before you start is recommended but not necessary. It can help to establish context for identifying names, places, etc. in the transcript.
If all else fails, follow the guidelines above to mark inaudible or unknown words and phrases or reach out the project lead for assistance! |
---|
Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use