Learning by Unsupervised Nonlinear Diffusion
Mauro Maggioni, James M. Murphy

TL;DR
This paper introduces LUND, a clustering algorithm that uses diffusion geometry and density estimation to identify clusters with nonlinear shapes, outperforming spectral and density-based methods.
Contribution
The paper presents a novel diffusion-based clustering method that leverages diffusion time as a scale parameter, with theoretical analysis and empirical validation.
Findings
LUND accurately identifies clusters with nonlinear shapes.
Diffusion time reveals mesoscopic equilibria between clusters.
LUND outperforms spectral and density-based clustering methods.
Abstract
This paper proposes and analyzes a novel clustering algorithm that combines graph-based diffusion geometry with techniques based on density and mode estimation. The proposed method is suitable for data generated from mixtures of distributions with densities that are both multimodal and have nonlinear shapes. A crucial aspect of this algorithm is the use of time of a data-adapted diffusion process as a scale parameter that is different from the local spatial scale parameter used in many clustering algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. These structures can be missed by the top eigenvectors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Topological and Geometric Data Analysis
MethodsSpectral Clustering
