Keywords
Textual data, Machine reading, Data extraction, Natural language proccessing, OntoSoar system
Abstract
Textual data—from manuscripts to publications to website content—contains much of extant human knowledge. Unfortunately, the ability to harvest and effectively use this information beyond simple search/retrieval is greatly hampered by the scale of the “reading” problem: there is too much for any one person to read, and computers are not entirely adept at comprehending all information—explicit and implicit—contained in natural language text. Developing increased capability in this area is the focus of ongoing “machine reading” and “reading the web” research initiatives. Interested parties include businesses, the military, and intelligence-gathering agencies. Our own ongoing work with the Church Family History Department’s vast digitized repository has led us to consider increased participation in this area of research. We propose to unite the efforts of two different BYU research labs to create a sophisticated machine reading system. Each lab has concentrated on specific aspects of machine reading: (1) data extraction, integration and modeling in the BYU Data Extraction Group (DEG) lab, and (2) sophisticated natural language parsing and cognitive modeling in the BYU NL-Soar lab. Both research efforts are mature, having produced many academic and scholarly products; both have also benefited from prior support from on-campus and off-campus funding. Our project will involve designing, implementing, and evaluating a new system, OntoSoar, which integrates our OntoES system with our Soar-based natural language processing systems. Soar is an agent-based cognitive modeling system that has served as an integration platform for several complex multi-task implementations. Our new reading agent will be capable of extracting low-level information in its first-pass treatment of a text; it will then perform a careful re-reading of the text to find more subtle conceptual relationships. OntoSoar will then compare extracted content from both processes and merge or supplement its growing knowledge base accordingly. We will evaluate the system against current research datasets.
Original Publication Citation
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle (2014). An ontology-driven reading agent. ORCA Mentoring Environment Grant (MEG) Final Report, Journal of Undergraduate Research, BYU.
BYU ScholarsArchive Citation
Lonsdale, Deryle W.; Embley, David W.; and Liddle, Stephen W., "An ontology-driven reading agent" (2014). Faculty Publications. 6824.
https://scholarsarchive.byu.edu/facpub/6824
Document Type
Report
Publication Date
2014
Publisher
Brigham Young University
Language
English
College
Humanities
Department
Linguistics and English Language
Copyright Use Information
https://lib.byu.edu/about/copyright/