Abstract
Multilingual Neural Machine Translation (MNMT) models enhance translation quality for low-resource languages by exploiting cross-lingual similarities during training--a process known as knowledge transfer. This transfer is particularly effective between languages that share lexical or structural features, often enabled by a common orthography. However, languages with strong phonetic and lexical similarities but distinct writing systems experience limited benefits, as the absence of a shared orthography hinders knowledge transfer. To address this limitation, we propose an approach based on phonetic information that enhances token-level alignment across scripts by leveraging transliterations. We systematically evaluate several phonetic transcription techniques and strategies for incorporating phonetic information into NMT models. Our results show that using a shared encoder to process orthographic and phonetic inputs separately consistently yields the best performance for Khmer, Thai, and Lao in both directions with English, and that our custom Cognate-Aware Transliteration (CAT) method consistently improves translation quality over the baseline.
Degree
MS
College and Department
Computer Science; Computational, Mathematical, and Physical Sciences
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Shurtz, Ammon, "When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer" (2025). Theses and Dissertations. 11049.
https://scholarsarchive.byu.edu/etd/11049
Date Submitted
2025-11-24
Document Type
Thesis
Keywords
multilingual MT, multilingualism, cross-lingual transfer, multilingual representations, less-resourced languages, software and tools, phonology
Language
english