TL;DR
This paper introduces MELD, a multiscale data model based on nonlinear diffusion, and M-LUND, an efficient clustering algorithm that captures latent multiscale structures in datasets with theoretical guarantees.
Contribution
The paper presents MELD for modeling multiscale data structure and M-LUND for efficient, theoretically-guaranteed clustering across multiple scales.
Findings
M-LUND effectively detects latent multiscale structures in synthetic datasets.
Theoretical guarantees support the algorithm's performance.
M-LUND demonstrates success on real datasets.
Abstract
Clustering algorithms partition a dataset into groups of similar points. The clustering problem is very general, and different partitions of the same dataset could be considered correct and useful. To fully understand such data, it must be considered at a variety of scales, ranging from coarse to fine. We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model, which is a family of clusterings parameterized by nonlinear diffusion on the dataset. We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis. To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Nonlinear Diffusion (M-LUND) clustering algorithm, which is derived from a diffusion process at a range of temporal scales. We provide theoretical guarantees for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
