The Bayesian Sorting Hat: A Decision-Theoretic Approach to Size-Constrained Clustering
Justin D. Silverman, Rachel K. Silverman

TL;DR
This paper introduces a Bayesian decision-theoretic method for size-constrained clustering, effectively handling external size constraints and demonstrated through survey data and a Harry Potter-themed team assignment problem.
Contribution
It reformulates size-constrained clustering as a decision problem with a novel loss function, enabling better handling of external size constraints unlike prior methods.
Findings
Effective clustering with size constraints demonstrated on survey data
The approach successfully assigned teams in a Harry Potter scavenger hunt
Outperforms existing methods in scenarios with external size limitations
Abstract
Size-constrained clustering (SCC) refers to the dual problem of using observations to determine latent cluster structure while at the same time assigning observations to the unknown clusters subject to an analyst defined constraint on cluster sizes. While several approaches have been proposed, SCC remains a difficult problem due to the combinatorial dependency between observations introduced by the size-constraints. Here we reformulate SCC as a decision problem and introduce a novel loss function to capture various types of size constraints. As opposed to prior work, our approach is uniquely suited to situations in which size constraints reflect and external limitation or desire rather than an internal feature of the data generation process. To demonstrate our approach, we develop a Bayesian mixture model for clustering respondents using both simulated and real categorical survey data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Census and Population Estimation · Data Management and Algorithms
