A Multiscale Environment for Learning by Diffusion

James M. Murphy; Sam L. Polk

arXiv:2102.00500·cs.LG·February 2, 2021

A Multiscale Environment for Learning by Diffusion

James M. Murphy, Sam L. Polk

PDF

1 Repo

TL;DR

This paper introduces MELD, a multiscale data model based on nonlinear diffusion, and M-LUND, an efficient clustering algorithm that captures latent multiscale structures in datasets with theoretical guarantees.

Contribution

The paper presents MELD for modeling multiscale data structure and M-LUND for efficient, theoretically-guaranteed clustering across multiple scales.

Findings

01

M-LUND effectively detects latent multiscale structures in synthetic datasets.

02

Theoretical guarantees support the algorithm's performance.

03

M-LUND demonstrates success on real datasets.

Abstract

Clustering algorithms partition a dataset into groups of similar points. The clustering problem is very general, and different partitions of the same dataset could be considered correct and useful. To fully understand such data, it must be considered at a variety of scales, ranging from coarse to fine. We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model, which is a family of clusterings parameterized by nonlinear diffusion on the dataset. We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis. To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Nonlinear Diffusion (M-LUND) clustering algorithm, which is derived from a diffusion process at a range of temporal scales. We provide theoretical guarantees for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sampolk/MultiscaleDiffusionClustering
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion