Untangling Gaussian Mixtures
Eva Fluck, Sandra Kiefer, Christoph Standke

TL;DR
This paper develops a formal framework using tangle theory to identify and analyze clusters in data sets modeled as Gaussian mixtures, linking graph connectivity with data clustering.
Contribution
It introduces a quantitative theory of tangles in Gaussian mixture data, providing conditions for their existence and a criterion for cluster separability.
Findings
Tangles can be used to formalize cluster detection in Gaussian mixture data.
Explicit conditions for the asymptotic existence of tangles are provided.
A formal criterion for cluster separability based on tangle theory is established.
Abstract
Tangles were originally introduced as a concept to formalize regions of high connectivity in graphs. In recent years, they have also been discovered as a link between structural graph theory and data science: when interpreting similarity in data sets as connectivity between points, finding clusters in the data essentially amounts to finding tangles in the underlying graphs. This paper further explores the potential of tangles in data sets as a means for a formal study of clusters. Real-world data often follow a normal distribution. Accounting for this, we develop a quantitative theory of tangles in data sets drawn from Gaussian mixtures. To this end, we equip the data with a graph structure that models similarity between the points and allows us to apply tangle theory to the data. We provide explicit conditions under which tangles associated with the marginal Gaussian distributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Metaheuristic Optimization Algorithms Research
