TL;DR
This paper demonstrates how the deterministic information bottleneck (DIB) can be used for geometric clustering, providing a unified information-theoretic framework that generalizes classic algorithms like k-means and GMMs.
Contribution
It introduces a novel method to perform geometric clustering using DIB and a new approach for selecting the optimal number of clusters based on information preservation tradeoffs.
Findings
DIB with model selection recovers true cluster labels in simple problems.
Clustering with DIB generalizes k-means and GMMs.
The method effectively identifies the number of clusters based on information tradeoffs.
Abstract
The information bottleneck (IB) approach to clustering takes a joint distribution and maps the data to cluster labels which retain maximal information about (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions . This is in contrast to classic "geometric clustering'' algorithms such as -means and gaussian mixture models (GMMs) which take a set of observed data points and cluster them based upon their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse and Schwab, 2017), a variant of IB, to perform geometric clustering, by choosing cluster labels that preserve information about data point location on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
