Clustering, factor discovery and optimal transport
Hongkang Yang, Esteban G. Tabak

TL;DR
This paper introduces a novel clustering framework based on Wasserstein barycenters from optimal transport, extending existing theory to affine transformations and providing robust algorithms for both discrete and continuous latent variables.
Contribution
It develops non-parametric clustering algorithms that generalize k-means and principal curves by leveraging Wasserstein barycenters and affine transformations.
Findings
Algorithms outperform traditional methods on artificial data.
Robustness demonstrated on real-world datasets.
Generalizes principal curves to continuous latent variables.
Abstract
The clustering problem, and more generally, latent factor discovery --or latent space inference-- is formulated in terms of the Wasserstein barycenter problem from optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the variance of the Wasserstein barycenter. Existing theory, which constrains the transport maps to rigid translations, is extended to affine transformations. The resulting non-parametric clustering algorithms include k-means as a special case and exhibit more robust performance. A continuous version of these algorithms discovers continuous latent variables and generalizes principal curves. The strength of these algorithms is demonstrated by tests on both artificial and real-world data sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Medical Image Segmentation Techniques · Topological and Geometric Data Analysis
