On the use of Wasserstein metric in topological clustering of distributional data
Gu\'ena\"el Cabanes, Youn\`es Bennani, Rosanna Verde, Antonio, Irpino

TL;DR
This paper introduces a clustering method for histogram data using Self-Organizing Maps with the Wasserstein distance, automatically determining the number of clusters based on local data density, validated on synthetic and real datasets.
Contribution
It combines SOM-based dimension reduction with Wasserstein distance for distributional data clustering, and automatically estimates the optimal number of clusters.
Findings
Effective clustering on synthetic data
Successful application to real datasets
Automatic determination of cluster number
Abstract
This paper deals with a clustering algorithm for histogram data based on a Self-Organizing Map (SOM) learning. It combines a dimension reduction by SOM and the clustering of the data in a reduced space. Related to the kind of data, a suitable dissimilarity measure between distributions is introduced: the Wasserstein distance. Moreover, the number of clusters is not fixed in advance but it is automatically found according to a local data density estimation in the original space. Applications on synthetic and real data sets corroborate the proposed strategy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Image Retrieval and Classification Techniques · Medical Image Segmentation Techniques
MethodsSelf-Organizing Map
