Keywords
quality data, data cleaning, automated information extraction, declarative constraint specification, automated integrity checking, conceptual-model-based extraction ensemble.
Abstract
Automatically extracted data is rarely “clean” with respect to pragmatic (real-world) constraints—which thus hinders applications that depend on quality data. We proffer a solution to detecting pragmatic constraint violations that works via a declarative and semantically enabled constraint-violation checker. In conjunction with an ensemble of automated information extractors, the implemented prototype checks both hard and soft constraints—respectively those that are satisfied or not and those that are satisfied probabilistically with respect to a threshold. An experimental evaluation shows that the constraint checker identifies semantic errors with high precision and recall and that pragmatic error identification can improve results
Original Publication Citation
Scott N. Woodfield, Deryle W. Lonsdale, Stephen W. Liddle, Tae Woo Kim and David W. Embley (2016). Pragmatic Quality Assessment for Automatically Extracted Data. In: IsabelleComyn-Wattiau, Katsumi Tanaka, Il-Yeol Song, Shuichiro Yamamoto, Motoshi Saeki (Eds.),Conceptual Modeling: Proceedings of the 35th International Conference on ConceptualModeling (ER 2016); Lecture Notes in Computer Science Vol. 9974; Springer InternationalPublishing; pp. 212-220. ISBN 978-3-319-46396-4.
BYU ScholarsArchive Citation
Lonsdale, Deryle W.; Woodfield, Scott N.; Liddle, Stephen W.; Woo Kim, Tae; Embley, David W.; and Almquist, Christopher, "Pragmatic Quality Assessment for Automatically Extracted Data" (2016). Faculty Publications. 6871.
https://scholarsarchive.byu.edu/facpub/6871
Document Type
Conference Paper
Publication Date
2016
Publisher
Springer International Publishing
Language
English
College
Humanities
Department
Linguistics
Copyright Use Information
https://lib.byu.edu/about/copyright/