Robust clustering tools based on optimal transportation
E. del Barrio, J.A. Cuesta-Albertos, C. Matr\'an, A. Mayo-\'Iscar

TL;DR
This paper introduces a robust clustering method in Wasserstein space called trimmed $k$-barycenters, enhancing stability and robustness for probability distributions, with applications in population and cytometric data analysis.
Contribution
The paper develops a new trimmed $k$-barycenter approach in Wasserstein space, enabling robust, parallelized clustering of probability distributions with proven consistency.
Findings
Method improves robustness in clustering probability distributions.
Consistent aggregation of estimates in parallel processing setup.
Effective in real-world applications like population and cytometric data.
Abstract
A robust clustering method for probabilities in Wasserstein space is introduced. This new "trimmed -barycenters" approach relies on recent results on barycenters in Wasserstein space that allow intensive computation, as required by clustering algorithms. The possibility of trimming the most discrepant distributions results in a gain in stability and robustness, highly convenient in this setting. As a remarkable application we consider a parallelized estimation setup in which each of units processes a portion of the data, producing an estimate of -features, encoded as probabilities. We prove that the trimmed -barycenter of the estimates produces a consistent aggregation. We illustrate the methodology with simulated and real data examples. These include clustering populations by age distributions and analysis of cytometric data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
