A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
Hal Daum\'e III, Daniel Marcu

TL;DR
This paper introduces a Bayesian supervised clustering model using the Dirichlet process prior, capable of handling infinite sets and outperforming existing methods on various tasks through MCMC inference.
Contribution
The paper presents a novel Bayesian supervised clustering framework with Dirichlet process prior and MCMC inference, applicable to multiple real-world tasks.
Findings
Outperforms existing algorithms on real-world datasets
Handles infinite clustering scenarios effectively
Demonstrates flexibility with conjugate and non-conjugate priors
Abstract
We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these "reference types") that are generic across all clusters. Inference in our framework, which requires integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple--but general--parameterization of our model based on a Gaussian assumption. We evaluate this model on one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Management and Algorithms
