Abstract

This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit.

Degree

MS

College and Department

Physical and Mathematical Sciences; Mathematics

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2013-12-11

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd6625

Keywords

Probabilistic record linkage, Character-based similarity metrics, Phonetic-based similarity metrics, ROC curves

Language

English

Included in

Mathematics Commons

Share

COinS