This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit.
College and Department
Physical and Mathematical Sciences; Mathematics
BYU ScholarsArchive Citation
Larsen, Stasha Ann Bown, "Record Linkage" (2013). Theses and Dissertations. 3833.
Probabilistic record linkage, Character-based similarity metrics, Phonetic-based similarity metrics, ROC curves