Wasserstein $k$-Centers Clustering for Distributional Data
Ryo Okano, Masaaki Imaizumi

TL;DR
This paper introduces a new clustering method for distributional data using Wasserstein geometry, capturing modes of variation and improving clustering accuracy over traditional methods.
Contribution
It develops a Wasserstein-based clustering approach that accounts for means and modes of variation in distributional data, leveraging geodesic PCA.
Findings
The method effectively captures distributional variation.
It improves clustering accuracy in simulations.
It performs well on real data applications.
Abstract
We develop a novel clustering method for distributional data, where each data point is regarded as a probability distribution on the real line. For distributional data, it has been challenging to develop a clustering method that utilizes modes of variation of the data because the space of probability distributions lacks a vector space structure, preventing the application of existing methods devised for functional data. Our clustering method for distributional data takes account of the differences in both means and modes of variation of clusters, in the spirit of the -centers clustering approach proposed for functional data. Specifically, we consider the space of distributions equipped with the Wasserstein metric and define geodesic modes of variation of distributional data using the notion of geodesic principal component analysis. Then, we utilize geodesic modes of clusters to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models
