A Flexible Model for Record Linkage
Kayan\'e Robach, St\'ephanie L van der Pas, Mark A van de Wiel, Michel H Hof

TL;DR
This paper introduces FlexRL, a flexible, scalable model for record linkage that balances accuracy and computational efficiency, effectively handling registration errors and changing identifiers in large datasets.
Contribution
The paper presents a novel stochastic EM-based approach for record linkage that adapts to data complexity and improves upon existing methods in accuracy and flexibility.
Findings
FlexRL effectively links large datasets with high accuracy.
The model demonstrates robustness to variable quality in linking variables.
Open source R package implementation is available.
Abstract
Combining data from various sources empowers researchers to explore innovative questions, for example those raised by conducting healthcare monitoring studies. However, the lack of a unique identifier often poses challenges. Record linkage procedures determine whether pairs of observations collected on different occasions belong to the same individual using partially identifying variables (e.g. birth year, postal code). Existing methodologies typically involve a compromise between computational efficiency and accuracy. Traditional approaches simplify this task by condensing information, yet they neglect dependencies among linkage decisions and disregard the one-to-one relationship required to establish coherent links. Modern approaches offer a comprehensive representation of the data generation process, at the expense of computational overhead and reduced flexibility. We propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Distributed systems and fault tolerance
