On the use of Wasserstein metric in topological clustering of   distributional data

Gu\'ena\"el Cabanes; Youn\`es Bennani; Rosanna Verde; Antonio; Irpino

arXiv:2109.04301·cs.LG·September 10, 2021

On the use of Wasserstein metric in topological clustering of distributional data

Gu\'ena\"el Cabanes, Youn\`es Bennani, Rosanna Verde, Antonio, Irpino

PDF

Open Access

TL;DR

This paper introduces a clustering method for histogram data using Self-Organizing Maps with the Wasserstein distance, automatically determining the number of clusters based on local data density, validated on synthetic and real datasets.

Contribution

It combines SOM-based dimension reduction with Wasserstein distance for distributional data clustering, and automatically estimates the optimal number of clusters.

Findings

01

Effective clustering on synthetic data

02

Successful application to real datasets

03

Automatic determination of cluster number

Abstract

This paper deals with a clustering algorithm for histogram data based on a Self-Organizing Map (SOM) learning. It combines a dimension reduction by SOM and the clustering of the data in a reduced space. Related to the kind of data, a suitable dissimilarity measure between distributions is introduced: the $L_{2}$ Wasserstein distance. Moreover, the number of clusters is not fixed in advance but it is automatically found according to a local data density estimation in the original space. Applications on synthetic and real data sets corroborate the proposed strategy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Image Retrieval and Classification Techniques · Medical Image Segmentation Techniques

MethodsSelf-Organizing Map