Text-to-speech (TTS) systems are ubiquitous. From Siri to Alexa to customer service phone call options, listening in a real-world context requires language learners to interact with TTS. Traditionally, language learners report difficulty when listening due to various reasons including genre, text, task, speaker characteristics, and environmental factors. This naturally leads to the question: how do learners perceive TTS in instructional contexts? Since TTS allows controls on speaker characteristics (e.g. gender, regional variety, speed, etc.) the variety of materials that could be created--especially in contexts in which native speakers are difficult or expensive to find--makes this an attractive option. However, the effectiveness of TTS, namely, intelligibility, expressiveness, and naturalness, might be questioned for those instances in which the listening is more empathic than informational. In this study, we examined participants' comprehension of the factual details and speaker emotion as well as collected their opinions towards TTS systems for language learning. This study took place in an intensive English Program (IEP) with an academic focus at a large university in the United States. The participants had ACTFL proficiency levels ranging from Novice High to Advance Low. The participants were divided into two groups and through a counterbalanced design, were given a listening assessment in which half of the listening passages were recorded by voice actors, and other half were generated by the TTS system. After the assessment, the participants were given a survey that inquired their opinion towards TTS systems as learning tools. We did not find significant relationships between the voice delivery and participants' comprehension of details and speakers' emotions. Furthermore, more than half of the participants held positive views to using TTS systems as learning tools; thus, this study suggested the use of TTS systems when applicable.



College and Department

Humanities; Linguistics and English Language



Date Submitted


Document Type





listening, text-to-speech, material development