Evaluating machine-assisted annotation in under-resourced settings
Keywords
Annotation, Corpus annotation, Machine assistance, Syriac studies, Bayesian data analysis, User study, Language resource evaluation
Abstract
Machine assistance is vital to managing the cost of corpus annotation projects. Identifying effective forms of machine assistance through principled evaluation is particularly important and challenging in under-resourced domains and highly heterogeneous corpora, as the quality of machine assistance varies. We perform a fine-grained evaluation of two machine-assistance techniques in the context of an under-resourced corpus annotation project. This evaluation requires a carefully controlled user study crafted to test a number of specific hypotheses. We show that human annotators performing morphological analysis of text in a Semitic language perform their task significantly more accurately and quickly when even mediocre pre-annotations are provided. When pre-annotations are at least 70 % accurate, annotator speed and accuracy show statistically significant relative improvements of 25–35 and 5–7 %, respectively. However, controlled user studies are too costly to be suitable for under-resourced corpus annotation projects. Thus, we also present an alternative analysis methodology that models the data as a combination of latent variables in a Bayesian framework. We show that modeling the effects of interesting confounding factors can generate useful insights. In particular, correction propagation appears to be most effective for our task when implemented with minimal user involvement. More importantly, by explicitly accounting for confounding variables, this approach has the potential to yield fine-grained evaluations using data collected in a natural environment outside of costly controlled user studies.
Original Publication Citation
Paul Felt, Eric K. Ringger, Kevin Seppi, Kristian S. Heal, Robbie A. Haertel, and Deryle Lonsdale (2014). Evaluating machine-assisted annotation in under-resourced settings. LanguageResources & Evaluation 48(4):561-599. Springer Science+Business Media, (Online publication date November 2013).
BYU ScholarsArchive Citation
Lonsdale, Deryle W.; Felt, Paul L.; Ringger, Eric K.; Seppi, Kevin; Heal, Kristian; and Haertel, Robbie A., "Evaluating machine-assisted annotation in under-resourced settings" (2013). Faculty Publications. 6881.
https://scholarsarchive.byu.edu/facpub/6881
Document Type
Peer-Reviewed Article
Publication Date
2013-11
Publisher
Springer Science+Business Media
Language
English
College
Humanities
Department
Linguistics
Copyright Status
© Springer Science+Business Media Dordrecht 2013
Copyright Use Information
https://lib.byu.edu/about/copyright/