Evaluating machine-assisted annotation in under-resourced settings

Keywords

Annotation, Corpus annotation, Machine assistance, Syriac studies, Bayesian data analysis, User study, Language resource evaluation

Abstract

Machine assistance is vital to managing the cost of corpus annotation projects. Identifying effective forms of machine assistance through principled evaluation is particularly important and challenging in under-resourced domains and highly heterogeneous corpora, as the quality of machine assistance varies. We perform a fine-grained evaluation of two machine-assistance techniques in the context of an under-resourced corpus annotation project. This evaluation requires a carefully controlled user study crafted to test a number of specific hypotheses. We show that human annotators performing morphological analysis of text in a Semitic language perform their task significantly more accurately and quickly when even mediocre pre-annotations are provided. When pre-annotations are at least 70 % accurate, annotator speed and accuracy show statistically significant relative improvements of 25–35 and 5–7 %, respectively. However, controlled user studies are too costly to be suitable for under-resourced corpus annotation projects. Thus, we also present an alternative analysis methodology that models the data as a combination of latent variables in a Bayesian framework. We show that modeling the effects of interesting confounding factors can generate useful insights. In particular, correction propagation appears to be most effective for our task when implemented with minimal user involvement. More importantly, by explicitly accounting for confounding variables, this approach has the potential to yield fine-grained evaluations using data collected in a natural environment outside of costly controlled user studies.

Original Publication Citation

Paul Felt, Eric K. Ringger, Kevin Seppi, Kristian S. Heal, Robbie A. Haertel, and Deryle Lonsdale (2014). Evaluating machine-assisted annotation in under-resourced settings. LanguageResources & Evaluation 48(4):561-599. Springer Science+Business Media, (Online publication date November 2013).

Document Type

Peer-Reviewed Article

Publication Date

2013-11

Publisher

Springer Science+Business Media

Language

English

College

Humanities

Department

Linguistics

University Standing at Time of Publication

Associate Professor

Share

COinS