Faculty Publications

Pragmatic Quality Assessment for Automatically Extracted Data

Deryle W. Lonsdale, Brigham Young UniversityFollow
Scott N. Woodfield, Brigham Young UniversityFollow
Stephen W. Liddle, Brigham Young University
Tae Woo Kim, Brigham Young University
David W. Embley, Brigham Young UniversityFollow
Christopher Almquist, Brigham Young University

Keywords

quality data, data cleaning, automated information extraction, declarative constraint specification, automated integrity checking, conceptual-model-based extraction ensemble.

Abstract

Automatically extracted data is rarely “clean” with respect to pragmatic (real-world) constraints—which thus hinders applications that depend on quality data. We proffer a solution to detecting pragmatic constraint violations that works via a declarative and semantically enabled constraint-violation checker. In conjunction with an ensemble of automated information extractors, the implemented prototype checks both hard and soft constraints—respectively those that are satisfied or not and those that are satisfied probabilistically with respect to a threshold. An experimental evaluation shows that the constraint checker identifies semantic errors with high precision and recall and that pragmatic error identification can improve results

Original Publication Citation

Scott N. Woodfield, Deryle W. Lonsdale, Stephen W. Liddle, Tae Woo Kim and David W. Embley (2016). Pragmatic Quality Assessment for Automatically Extracted Data. In: IsabelleComyn-Wattiau, Katsumi Tanaka, Il-Yeol Song, Shuichiro Yamamoto, Motoshi Saeki (Eds.),Conceptual Modeling: Proceedings of the 35th International Conference on ConceptualModeling (ER 2016); Lecture Notes in Computer Science Vol. 9974; Springer InternationalPublishing; pp. 212-220. ISBN 978-3-319-46396-4.

BYU ScholarsArchive Citation

Lonsdale, Deryle W.; Woodfield, Scott N.; Liddle, Stephen W.; Woo Kim, Tae; Embley, David W.; and Almquist, Christopher, "Pragmatic Quality Assessment for Automatically Extracted Data" (2016). Faculty Publications. 6871.
https://scholarsarchive.byu.edu/facpub/6871

Document Type

Conference Paper

Publication Date

2016

Publisher

Springer International Publishing

Language

English

College

Humanities

Department

Linguistics

University Standing at Time of Publication

Associate Professor

Copyright Use Information

https://lib.byu.edu/about/copyright/

Download

Included in

Linguistics Commons

COinS

BYU ScholarsArchive

Faculty Publications

Pragmatic Quality Assessment for Automatically Extracted Data

Keywords

Abstract

Original Publication Citation

BYU ScholarsArchive Citation

Document Type

Publication Date

Publisher

Language

College

Department

University Standing at Time of Publication

Copyright Use Information

Included in

Search

Browse

BYU Links

Author Corner

Hosted by the

BYU ScholarsArchive

Faculty Publications

Pragmatic Quality Assessment for Automatically Extracted Data

Authors

Keywords

Abstract

Original Publication Citation

BYU ScholarsArchive Citation

Document Type

Publication Date

Publisher

Language

College

Department

University Standing at Time of Publication

Copyright Use Information

Included in

Share

Search

Browse

BYU Links

Author Corner

Hosted by the