Keywords
large-scale audio corpora, phonetic analysis of formant measurements, DARLA and FAVE processing, challenges in corpus building, time and effort in transcription
Abstract
Large-scale transcribed audio corpora are available on Buckeye Corpus, Santa Barbara Corpus, etc.
How do these come to be? What’s the on-the-ground process of building such a corpus?
Here we discuss:
- Methods for large-scale transcription
- Early data & analysis resulting from transcription
Large-scale transcription:
- Time to transcribe. Estimated: 10:1; Reality:13:1
Phonetic Analysis:
- Comparison of formant measurements
- In-house Praat script no good
- DARLA filtered out 53%
- Too early to tell if FAVE modifications were better
Original Publication Citation
Rachel Olsen, Michael Olsen, Joseph A. Stanley & Margaret E. L. Renwick. “Transcribing the Digital Archive of Southern Speech: Methods and Preliminary Analysis.” 84th Meeting of the Southeastern Conference on Linguistics (SECOL84). Charleston, SC. March 8–11, 2017.
BYU ScholarsArchive Citation
Olsen, Rachel Miller; Olsen, Michael L.; Stanley, Joseph A.; and Renwick, Margaret E. L., "Transcribing the Digital Archive of Southern Speech: Methods and Preliminary Analysis" (2017). Faculty Publications. 7986.
https://scholarsarchive.byu.edu/facpub/7986
Document Type
Presentation
Publication Date
2017
Publisher
84th Meeting of the Southeastern Conference on Linguistics
Language
English
College
Humanities
Department
Linguistics
Copyright Use Information
https://lib.byu.edu/about/copyright/