On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters
G.I. Papayiannis, G.N. Domazakis, D. Drivaliaris, S. Koukoulas, A.E., Tsekrekos, A.N. Yannacopoulos

TL;DR
This paper introduces a clustering method for uncertain and structured data using Wasserstein barycenters and a geodesic criterion, effective in fields with complex data and significant observational errors.
Contribution
It proposes a novel clustering approach based on Wasserstein geometry and introduces a geodesic criterion for determining the optimal number of clusters.
Findings
Effective in clustering uncertain data in simulations
Successfully applied to bond yield curve clustering
Classified satellite image land uses accurately
Abstract
In this work clustering schemes for uncertain and structured data are considered relying on the notion of Wasserstein barycenters, accompanied by appropriate clustering indices based on the intrinsic geometry of the Wasserstein space where the clustering task is performed. Such type of clustering approaches are highly appreciated in many fields where the observational/experimental error is significant (e.g. astronomy, biology, remote sensing, etc.) or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat them (e.g. network data, interval data, high frequency records, matrix data, etc.). Under this perspective, each observation is identified by an appropriate probability measure and the proposed clustering schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
