Keywords

source separation, sociophonetics, methods, data processing, vowel formant analysis

Abstract

While recent advances in sociophonetic data processing have made it possible to analyze large datasets and audio not originally intended for linguistic analysis, overlapping speech in recordings with multiple speakers continues to be an issue that results in lost data. We evaluate whether current source separation models produce audio that is clean enough to produce reliable measurements for sociophonetic analysis. We compare formant estimates from a pair of pristine recordings and merged-and-separated versions of those same recordings using the Libri2mix, Whamr16K, and WSJ02mix source separation models. Based on auditory inspection of the separated files, visualization of vowel formant estimates, and statistical analysis, Libri2 performed best and WSJ02 was worst. While the mean formant measurements per vowel were usually small, differences for each observation were larger in unpredictable ways. We are cautiously optimistic about using these tools in sociophonetic analysis, so long as analysis is conducted on vowel means. We conclude with recommendations that researchers can implement when using source separation in sociophonetic research.

Original Publication Citation

Joseph A. Stanley, Lisa Morgan Johnson, & Earl Kjar Brown. “Testing the Effect of Speech Separation on Vowel Formant Estimates.” Linguistics Vanguard. DOI: 10.1515/lingvan-2024-0152.

Document Type

Peer-Reviewed Article

Publication Date

2025

Publisher

Linguistics Vanguard

Language

English

College

Humanities

Department

Linguistics

University Standing at Time of Publication

Assistant Professor

Included in

Linguistics Commons

Share

COinS