Abstract

Entity extraction is an important step in document understanding. Higher accuracy entity extraction on fine-grained entities can be achieved by combining the utility of Named Entity Recognition (NER) and Relation Extraction (RE) models. In this paper, a cascading model is proposed that implements NER and Relation extraction. This model utilizes relations between entities to infer context-dependent fine-grain named entities in text corpora. The RE module runs independent of the NER module, which reduces error accumulation from sequential steps. This process improves on the fine-grained NER F1-score of existing state-of-the-art from .4753 to .8563 on our data, albeit on a strictly limited domain. This provides the potential for further applications in historical document processing. These applications will enable automated searching of historical documents, such as those used in economics research and family history.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2023-03-08

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd13103

Keywords

natural language processing, named entity recognition, relation extraction, information extraction

Language

english

Share

COinS