Abstract
We show that a carefully crafted probabilistic morphological analyzer significantly outperforms a reasonable, naive baseline for Syriac. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. Such tools are widely used to contribute to the process of annotating morphologically complex languages. We introduce and connect novel data-driven models for segmentation, dictionary linkage, and morphological tagging in a joint pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of this model with varying amounts of training data and find that with about 34,500 tokens, it can outperform the baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, this joint model achieves 86.47% accuracy — a 29.7% reduction in error rate over the baseline.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
McClanahan, Peter J., "A Probabilistic Morphological Analyzer for Syriac" (2010). Theses and Dissertations. 2200.
https://scholarsarchive.byu.edu/etd/2200
Date Submitted
2010-07-08
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd3748
Keywords
segmentation, dictionary linkage, morphological tagging, Syriac, Semitic languages, probabilistic models, joint pipelines
Language
English