A Hierarchical Graphical Model for Record Linkage
Pradeep Ravikumar, William Cohen

TL;DR
This paper introduces a hierarchical graphical model framework for unsupervised record linkage, effectively handling unlabeled data and incorporating monotonicity constraints, with competitive results against supervised methods.
Contribution
It proposes a novel hierarchical graphical model approach for unsupervised record linkage and extends existing methods within this framework.
Findings
Unsupervised methods perform competitively with supervised ones.
Incorporating monotonicity constraints improves model robustness.
Bootstrapping with single-field classifiers aids in labeling latent variables.
Abstract
The task of matching co-referent records is known among other names as rocord linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonable clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the linakge-problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotinicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Advanced Database Systems and Queries
