Keywords
source separation, sociophonetics, methods, data processing, vowel formant analysis
Abstract
While recent advances in sociophonetic data processing have made it possible to analyze large datasets and audio not originally intended for linguistic analysis, overlapping speech in recordings with multiple speakers continues to be an issue that results in lost data. We evaluate whether current source separation models produce audio that is clean enough to produce reliable measurements for sociophonetic analysis. We compare formant estimates from a pair of pristine recordings and merged-and-separated versions of those same recordings using the Libri2mix, Whamr16K, and WSJ02mix source separation models. Based on auditory inspection of the separated files, visualization of vowel formant estimates, and statistical analysis, Libri2 performed best and WSJ02 was worst. While the mean formant measurements per vowel were usually small, differences for each observation were larger in unpredictable ways. We are cautiously optimistic about using these tools in sociophonetic analysis, so long as analysis is conducted on vowel means. We conclude with recommendations that researchers can implement when using source separation in sociophonetic research.
Original Publication Citation
Joseph A. Stanley, Lisa Morgan Johnson, & Earl Kjar Brown. “Testing the Effect of Speech Separation on Vowel Formant Estimates.” Linguistics Vanguard. DOI: 10.1515/lingvan-2024-0152.
BYU ScholarsArchive Citation
Stanley, Joseph A.; Johnson, Lisa Morgan; and Brown, Earl Kjar, "Testing the Effect of Speech Separation on Vowel Formant Estimates" (2025). Faculty Publications. 7949.
https://scholarsarchive.byu.edu/facpub/7949
Document Type
Peer-Reviewed Article
Publication Date
2025
Publisher
Linguistics Vanguard
Language
English
College
Humanities
Department
Linguistics
Copyright Use Information
https://lib.byu.edu/about/copyright/