•  
  •  
 

Journal of Undergraduate Research

Keywords

genealogical information, internet, data extraction, genealogy

College

Physical and Mathematical Sciences

Department

Computer Science

Abstract

Data extraction is a rapidly growing area of computer science. It focuses on the extraction of pertinent data from large stores of knowledge such as databases or the internet. Data extraction allows us to use existing stores of data in new ways. One application for data extraction is genealogical research. Various commercial and non-profit groups make genealogical data available on line. In addition to these, hundreds of personal web pages contain personal family trees. I wanted to enable the extraction of information from these sources by computer. BYU’s Data Extraction Group (DEG) has developed tools for extracting data from web pages in HTML format. These tools can be found at www.deg.byu.edu. I developed an ontology (scheme for extracting and storing data) and related lexicons for these tools to extract genealogical data.

Share

COinS