Multilevel Clustering via Wasserstein Means
Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Hai Bui, Viet Huynh,, Dinh Phung

TL;DR
This paper introduces a new multilevel clustering method using Wasserstein distances, enabling simultaneous local and global data grouping with proven consistency and scalable algorithms.
Contribution
It presents a novel Wasserstein-based multilevel clustering framework with efficient algorithms and theoretical guarantees, applicable to large hierarchical datasets.
Findings
Demonstrates scalability on synthetic and real data
Establishes consistency of clustering estimates
Offers flexible variants of the clustering problem
Abstract
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Consistency properties are established for the estimates of both local and global clusters. Finally, experiment results with both synthetic and real data are presented to demonstrate the flexibility and scalability of the proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Clustering Algorithms Research · Automated Road and Building Extraction
