A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems
Mauricio Sadinle, Stephen E. Fienberg

TL;DR
This paper introduces a probabilistic framework extending Fellegi-Sunter theory for linking multiple data files without unique identifiers, demonstrated on homicide records and evaluated through simulations.
Contribution
It generalizes Fellegi-Sunter for multiple record linkage, incorporating transitivity and using a mixture model with EM algorithm for improved accuracy.
Findings
Method performs well on homicide data integration
Effective under measurement error scenarios
Proves optimality of record pattern classification
Abstract
We present a probabilistic method for linking multiple datafiles. This task is not trivial in the absence of unique identifiers for the individuals recorded. This is a common scenario when linking census data to coverage measurement surveys for census coverage evaluation, and in general when multiple record-systems need to be integrated for posterior analysis. Our method generalizes the Fellegi-Sunter theory for linking records from two datafiles and its modern implementations. The multiple record linkage goal is to classify the record K-tuples coming from K datafiles according to the different matching patterns. Our method incorporates the transitivity of agreement in the computation of the data used to model matching probabilities. We use a mixture model to fit matching probabilities via maximum likelihood using the EM algorithm. We present a method to decide the record K-tuples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
