Record linkage is the process of combining information about a single individual from two or more records. Probabilistic record linkage gives weights to each field that is compared. The decision of whether the records should be linked is then determined by the sum of the weights, or “Score”, over all fields compared. Using methods similar to the simple versus simple most powerful test, an optimal record linkage decision rule can be established to minimize the number of unlinked records when the probability of false positive and false negative errors are specified. The weights needed for probabilistic record linkage necessitate linking a “training” subset of records for the computations. This is not practical in many settings, as hand matching requires a considerable time investment. In 1989, Matthew A. Jaro demonstrated how the Expectation-Maximization, or EM, algorithm could be used to compute the needed weights when fields have Binomial matching possibilities. This project applies this method of using the EM algorithm to calculate weights for head-of-household records from the 1910 and 1920 Censuses for Ascension Parish of Louisiana and Church and County Records from Perquimans County, North Carolina. This project also expands the Jaro's EM algorithm to a Multinomial framework. The performance of the EM algorithm for calculating weights will be assessed by comparing the computed weights to weights computed by clerical matching. Simulations will also be conducted to investigate the sensitivity of the algorithm to the total number of record pairs, the number of fields with missing entries, the starting values of estimated probabilities, and the convergence epsilon value.
College and Department
Physical and Mathematical Sciences; Statistics
BYU ScholarsArchive Citation
Bauman, G. John, "Computation of Weights for Probabilistic Record Linkage Using the EM Algorithm" (2006). Theses and Dissertations. 746.
record linkage, EM algorithm