A hierarchical Bayesian approach to record linkage and population size problems
Andrea Tancredi, Brunero Liseo

TL;DR
This paper introduces a hierarchical Bayesian method for record linkage and population size estimation that leverages observed categorical data without reduction, allowing for accurate uncertainty propagation in both tasks.
Contribution
It presents a novel hierarchical Bayesian model that jointly addresses record linkage and population size estimation, improving uncertainty handling over traditional methods.
Findings
Effective in real data example and simulations.
Accounts for uncertainty in both linkage and population estimates.
No reduction of categorical data to binary comparisons.
Abstract
We propose and illustrate a hierarchical Bayesian approach for matching statistical records observed on different occasions. We show how this model can be profitably adopted both in record linkage problems and in capture--recapture setups, where the size of a finite population is the real object of interest. There are at least two important differences between the proposed model-based approach and the current practice in record linkage. First, the statistical model is built up on the actually observed categorical variables and no reduction (to 0--1 comparisons) of the available information takes place. Second, the hierarchical structure of the model allows a two-way propagation of the uncertainty between the parameter estimation step and the matching procedure so that no plug-in estimates are used and the correct uncertainty is accounted for both in estimating the population size and in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
