Keywords
data-driven, Syriac, segmentation, a probabilistic morphological analyzer
Abstract
We define a probabilistic morphological analyzer using a data-driven approach for Syriac in order to facilitate the creation of an annotated corpus. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. We introduce novel probabilistic models for segmentation, dictionary linkage, and morphological tagging and connect them in a pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of models with varying amounts of training data and find that with about 34,500 labeled tokens, we can outperform a reasonable baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, our joint model achieves 86.47% accuracy, a 29.7% reduction in error rate over the baseline.
Original Publication Citation
Peter McClanahan, George Busby, Robbie Haertel, Kristian Heal, Deryle Lonsdale, Kevin Seppiand Eric Ringger (2010). A Probabilistic Morphological Analyzer for Syriac. In: Proceedings ofthe 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), MIT,Massachusetts; Association for Computational Linguistics; pp. 810-820; ISBN 978-1-932432-86-2.
BYU ScholarsArchive Citation
Lonsdale, Deryle W.; McClanahan, Peter J.; Busby, George; Haertel, Robbie A.; Heal, Kristian; Seppi, Kevin; and Ringger, Eric, "A Probabilistic Morphological Analyzer for Syriac" (2010). Faculty Publications. 6855.
https://scholarsarchive.byu.edu/facpub/6855
Document Type
Conference Paper
Publication Date
2010
Publisher
Association for Computational Linguistics
Language
English
College
Humanities
Department
Linguistics
Copyright Status
© 2010 Association for Computational Linguistics
Copyright Use Information
https://lib.byu.edu/about/copyright/