Faculty Publications

Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation

George BusbyFollow
Marc Carmen
James Carroll
Robbie Haertel
Deryle W. Lonsdale
Peter McClanahan
Eric K. Ringger
Kevin Seppi, Brigham Young UniversityFollow

Keywords

machine learning, corpus annotation, part-of-speech tagging

Abstract

In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and new variations of QBC and QBU, inspired by weaknesses particular to their use in this application. Experiments on English prose and poetry test these approaches and evaluate their robustness. The results allow us to make recommendations for both types of text and raise questions that will lead to further inquiry.

Original Publication Citation

Eric Ringger, Peter McClanahan, Robbie Haertel, George Busby, Marc Carmen, James Carroll, Kevin Seppi, and Deryle Lonsdale. June 27. "Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation." In Proceedings of the ACL 27 Linguistic Annotation Workshop (LAW 27). Czech Republic. pp. 11-18.

BYU ScholarsArchive Citation

Busby, George; Carmen, Marc; Carroll, James; Haertel, Robbie; Lonsdale, Deryle W.; McClanahan, Peter; Ringger, Eric K.; and Seppi, Kevin, "Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation" (2007). Faculty Publications. 253.
https://scholarsarchive.byu.edu/facpub/253

Document Type

Peer-Reviewed Article

Publication Date

2007-06-01

Permanent URL

http://hdl.lib.byu.edu/1877/2640

Publisher

ACL Press

Language

English

College

Physical and Mathematical Sciences

Department

Computer Science

Copyright Status

Copyright Use Information

http://lib.byu.edu/about/copyright/

Download

Included in

Computer Sciences Commons

COinS

BYU ScholarsArchive

Faculty Publications

Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation

Keywords

Abstract

Original Publication Citation

BYU ScholarsArchive Citation

Document Type

Publication Date

Permanent URL

Publisher

Language

College

Department

Copyright Status

Copyright Use Information

Included in

Search

Browse

BYU Links

Author Corner

Hosted by the

BYU ScholarsArchive

Faculty Publications

Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation

Authors

Keywords

Abstract

Original Publication Citation

BYU ScholarsArchive Citation

Document Type

Publication Date

Permanent URL

Publisher

Language

College

Department

Copyright Status

Copyright Use Information

Included in

Share

Search

Browse

BYU Links

Author Corner

Hosted by the