Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 12 Next »

The Harvard Film Archive has many years worth of recorded audio of events with visiting artists at our Cinematheque. 


We need complete transcripts for these to ensure ease-of-use for all researchers and patrons, so we are looking for help in the editing of AI-generated transcript files. Recordings are usually 20-60 minutes long, including introductions plus post-film discussions with audience participation, and consist of lively discussions between our curators and the filmmakers who visit the HFA Cinematheque from around the world. 

We are starting with the creation of transcripts for a small set of audio files to test the workflow, and we need your help!

The plan is simple.

  1. Once you sign up on the Harvard Training Portal, please email amy_sloper@harvard.edu and you will be given access to edit the Project Sheet where you can mark your interest by placing your name and email next to one visiting artist event you would like to work on.



    1. If a name already appears in the “Transcriber Name” field, that set of files is in the process of being assigned to a transcriber.
    2. Please note that specific language skills are recommended for some events but are not required.
    3. This sheet will continue to be updated as the project lead readies additional files, so if there is nothing available when you view it please keep checking!
  2. Within 24 hours of sign-up, the project lead will give you edit access to the computer-generated transcript.
  3. Read the Formatting Rules (below) closely before you begin for guidance on your work.
  4. Start work on your file on your own time. If you have questions, please reach out directly to the project lead via email.
  5. When you are finished with your files, change the status for your file to “ready for review” in the Project Sheet. The project lead will reach out to you if they have any questions about your work

  6. If you enjoyed your first project, sign up for another set of files!

Guidelines and Resources

Our biggest goal is to create a standalone, easy to read document that can be used for research. All formatting guidelines are toward this goal.

You are starting with a computer-generated transcript created directly from the audio file. It will contain numerous mistakes, so correcting the text is the most important thing for you to help us with.

The final edited transcript should follow the audio exactly, with the exception of removing umms, ahhs, mistakes, restarts, etc. in an effort to create a “clean” easily readable version of the document.

**Transcripts that include translators and audio in more than one language will have some separate guidelines, which are forthcoming**

A sample transcript for review can be found here.

Formatting Rules for Single Language Transcripts

  • Italicize film titles
  • Flags
    • Flags appear in place of a word, phrase, or section that you cannot hear, confirm, or understand
    • Within a sentence, format flags with a space on either side of the bracket
    • Use ALL-CAPS for all flag text inside of brackets
    • Use [INAUDIBLE] to denote words or section that can not be heard
    • Use [?FLAGGED WORD?] to denote words or proper nouns that can not be confirmed
    • Use [UNKNOWN] to denote a words or sections that you can hear but have no guess as to what is being said
  • Speaker Identification
    • Use names if known (full names without titles) such as Haden Guest, David Pendleton, Agnes Varda
    • Label unidentified speakers as they appear in the raw transcript: Unknown Speaker
    • All audience members (present during the q/a sections) will be labeled as Audience, even if name is given
  • Enhanced Audio Description
    • Audio descriptions such as [APPLAUSE] and [LAUGHTER] should be included whenever possible
      • These should be put on their own line, for example:
        • Haden Guest 00:15

          Good evening, ladies and gentlemen. My name is Haden Guest. I'm director of the Harvard Film Archive.

          [APPLAUSE] 00:42

          Haden Guest 00:50

          I consider Kelly Reichardt to be one of the great American filmmakers, not just of today, but of all time.

    • Provide significant information in all caps (and brackets) and avoid editorializing. 
      • Please remove the speaker name for these sections, for example: 

        [THE SPEAKER TESTS THE MICROPHONE] 

  • Timestamps
    • Please leave timestamps that occur at the beginning of a speaker's section in your transcript document
    • If timestamps occur in the middle of a speaker's segment or sentence, you can remove them. For example:
edit this from the computer-generated document:into this in your edited transcript:

Unknown Speaker 1 8:45  

So this film you're going to see with electronic subtitles I'm sorry, seems

Unknown Speaker  8:55  

there is

Unknown Speaker  8:59  

only print available and the other one is without English subtitles like make it almost precious.

Unknown Speaker  8:45  

So this film you're going to see with electronic subtitles I'm sorry, seems there is only print available and the other one is without English subtitles like make it almost precious.


Formatting Rules for Transcripts with Multiple Languages

This most often means an English translator is present during the discussion to translate the native speaker's sections.

  • All rules listed for English language transcripts apply
  • Clean up the English sections as you would with any transcript
  • For any foreign language sections, place the text existing in brackets, or mark the section with the foreign language in brackets, ie: [SPANISH]
  • Additional work on editing and cleanup will be done on the foreign language sections - we will try to assign this to a speaker/writer of the foreign language.

Research

In most cases you will be streaming the audio directly from the event page on the HFA website. Here you can find the correct spelling of the visiting artist’s name, the titles of their films, and read a bit about the filmmaker before you begin.

Additionally, doing a bit of background research on the filmmaker and their work (on imdb, wikipedia, through google and/or journal searches) before you start is recommended but not necessary. It can help to establish context for identifying names, places, etc. in the transcript.

If all else fails, follow the guidelines above to mark inaudible or unknown words and phrases or reach out the project lead for assistance!





  • No labels