Journal of Undergraduate Research
Keywords
genealogical information, internet, data extraction, genealogy
College
Physical and Mathematical Sciences
Department
Computer Science
Abstract
Data extraction is a rapidly growing area of computer science. It focuses on the extraction of pertinent data from large stores of knowledge such as databases or the internet. Data extraction allows us to use existing stores of data in new ways. One application for data extraction is genealogical research. Various commercial and non-profit groups make genealogical data available on line. In addition to these, hundreds of personal web pages contain personal family trees. I wanted to enable the extraction of information from these sources by computer. BYU’s Data Extraction Group (DEG) has developed tools for extracting data from web pages in HTML format. These tools can be found at www.deg.byu.edu. I developed an ontology (scheme for extracting and storing data) and related lexicons for these tools to extract genealogical data.
Recommended Citation
Walker, Troy and Embley, Dr. David
(2014)
"Extraction of Genealogical Information from the Internet,"
Journal of Undergraduate Research: Vol. 2014:
Iss.
1, Article 1186.
Available at:
https://scholarsarchive.byu.edu/jur/vol2014/iss1/1186