On Efficient Multilevel Clustering via Wasserstein Distances
Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail, Yurochkin, Hung Bui, and Dinh Phung

TL;DR
This paper introduces a scalable multilevel clustering method using Wasserstein distances, enabling simultaneous data partitioning and pattern discovery in hierarchical datasets with proven consistency and demonstrated effectiveness.
Contribution
It presents a novel joint optimization framework over Wasserstein spaces for multilevel clustering, with fast algorithms and theoretical consistency guarantees.
Findings
Effective clustering on synthetic and real datasets
Fast optimization algorithms developed
Proven consistency of clustering estimates
Abstract
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. We propose several variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Consistency properties are established for the estimates of both local and global clusters. Finally, experimental results with both synthetic and real data are presented to demonstrate the flexibility and scalability of the proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Automated Road and Building Extraction · Advanced Clustering Algorithms Research
