Abstract
This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit.
Degree
MS
College and Department
Physical and Mathematical Sciences; Mathematics
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Larsen, Stasha Ann Bown, "Record Linkage" (2013). Theses and Dissertations. 3833.
https://scholarsarchive.byu.edu/etd/3833
Date Submitted
2013-12-11
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd6625
Keywords
Probabilistic record linkage, Character-based similarity metrics, Phonetic-based similarity metrics, ROC curves
Language
English