TL;DR
This paper introduces a Bayesian graphical model for entity resolution that uses exchangeable priors, a realistic distortion model, hyperpriors, and a faster inference algorithm, demonstrating improved accuracy across various datasets.
Contribution
It presents a novel Bayesian approach with exchangeable priors, a refined distortion model, hyperpriors, and an efficient Gibbs sampler for entity resolution.
Findings
Model outperforms alternatives in accuracy
Consistent performance across diverse scenarios
Higher F1 scores in experiments
Abstract
Entity resolution (record linkage or deduplication) is the process of identifying and linking duplicate records in databases. In this paper, we propose a Bayesian graphical approach for entity resolution that links records to latent entities, where the prior representation on the linkage structure is exchangeable. First, we adopt a flexible and tractable set of priors for the linkage structure, which corresponds to a special class of random partition models. Second, we propose a more realistic distortion model for categorical/discrete record attributes, which corrects a logical inconsistency with the standard hit-miss model. Third, we incorporate hyperpriors to improve flexibility. Fourth, we employ a partially collapsed Gibbs sampler for inferential speedups. Using a selection of private and nonprivate data sets, we investigate the impact of our modeling contributions and compare our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
