An Alternative Prior Process for Nonparametric Bayesian Clustering
Hanna M. Wallach, Shane T. Jensen, Lee Dicker, Katherine A. Heller

TL;DR
This paper introduces the uniform process as an alternative prior for nonparametric Bayesian clustering, addressing the limitations of the Dirichlet and Pitman-Yor processes by removing the 'rich-get-richer' property, with theoretical and practical evaluations.
Contribution
It proposes the uniform process as a new prior for clustering, analyzing its properties and demonstrating its advantages over traditional processes in specific applications.
Findings
Uniform process lacks exchangeability but offers advantages in certain clustering scenarios.
Theoretical analysis of clustering behavior of the uniform process.
Empirical results show improved performance on document clustering.
Abstract
Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering -- the uniform process -- for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Algorithms and Data Compression
