Enabling Search for Facts and Implied Facts in Historical Documents
Keywords
fact extraction, implied facts, historical documents
Abstract
Building a database of facts extracted from historical documents to enable database-like query and search would reduce the tedium of gleaning facts of interest from historical documents. We propose a solution in which historical documents themselves constitute the stored database. In our solution, we use information-extraction techniques to produce a conceptualized external annotation of facts found in each document, and we superimpose the conceptualization over the document collection. The annotation process populates the conceptualization producing a repository of extracted facts, and a reasoner obtains inferred facts from these extracted facts. Our query interface accepts free-form queries and converts them to formal queries over the extracted and inferred facts. Displayed results include, in addition to standard query results, images of original documents with results highlighted along with reasoning chains for inferred facts grounded in these highlighted facts. Along with giving the implementation status of our proof-of-concept prototype, we present results for extraction accuracy and efficiency and point to current and future work needed to enable a practical solution for the envisioned historical-document database.
Original Publication Citation
D.W. Embley, S.W. Liddle, D.W. Lonsdale, S. Machado, T. Packer, J. Park, N. Tate, and A.Zitzelberger (2011). Enabling Search for Facts and Implied Facts in Historical Documents. Proceedings of the International Workshop on Historical Document Imaging and Processing (HIP 2011), Beijing, China, pp. 59-66
BYU ScholarsArchive Citation
Lonsdale, Deryle W.; Embley, David W.; Machado, Spencer; Packer, Thomas L.; Park, Joseph; Zitzelberger, Andrew J.; Liddle, Stephen W.; and Tate, Nathan, "Enabling Search for Facts and Implied Facts in Historical Documents" (2011). Faculty Publications. 6859.
https://scholarsarchive.byu.edu/facpub/6859
Document Type
Peer-Reviewed Article
Publication Date
2011
Publisher
Historical Document Imaging and Processing
Language
English
College
Humanities
Department
Linguistics
Copyright Status
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HIP’ll September 16 – September 17 2011, Beijing, China Copyright 2011 ACM 978-1-4503-0916-5/11/09 ...$10.00.
Copyright Use Information
https://lib.byu.edu/about/copyright/