Abstract

Multilingual Neural Machine Translation (MNMT) models enhance translation quality for low-resource languages by exploiting cross-lingual similarities during training--a process known as knowledge transfer. This transfer is particularly effective between languages that share lexical or structural features, often enabled by a common orthography. However, languages with strong phonetic and lexical similarities but distinct writing systems experience limited benefits, as the absence of a shared orthography hinders knowledge transfer. To address this limitation, we propose an approach based on phonetic information that enhances token-level alignment across scripts by leveraging transliterations. We systematically evaluate several phonetic transcription techniques and strategies for incorporating phonetic information into NMT models. Our results show that using a shared encoder to process orthographic and phonetic inputs separately consistently yields the best performance for Khmer, Thai, and Lao in both directions with English, and that our custom Cognate-Aware Transliteration (CAT) method consistently improves translation quality over the baseline.

Degree

MS

College and Department

Computer Science; Computational, Mathematical, and Physical Sciences

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2025-11-24

Document Type

Thesis

Keywords

multilingual MT, multilingualism, cross-lingual transfer, multilingual representations, less-resourced languages, software and tools, phonology

Language

english

Share

COinS