Git: Clustering Based on Graph of Intensity Topology
Zhangyang Gao, Haitao Lin, Cheng Tan, Lirong Wu, Stan. Z Li

TL;DR
GIT is a novel clustering algorithm that combines local intensity-based clustering with a global topological graph approach, effectively handling noise and scale variations while outperforming existing methods on multiple datasets.
Contribution
The paper introduces GIT, a new clustering method that integrates local intensity peaks and global topology, with automatic noise edge removal using Wasserstein distance, advancing clustering robustness and accuracy.
Findings
GIT outperforms seven competing algorithms on synthetic and real datasets.
GIT achieves about 10% higher F1-score on MNIST and FashionMNIST.
GIT demonstrates robustness to noise and scale variations.
Abstract
\textbf{A}ccuracy, \textbf{R}obustness to noises and scales, \textbf{I}nterpretability, \textbf{S}peed, and \textbf{E}asy to use (ARISE) are crucial requirements of a good clustering algorithm. However, achieving these goals simultaneously is challenging, and most advanced approaches only focus on parts of them. Towards an overall consideration of these aspects, we propose a novel clustering algorithm, namely GIT (Clustering Based on \textbf{G}raph of \textbf{I}ntensity \textbf{T}opology). GIT considers both local and global data structures: firstly forming local clusters based on intensity peaks of samples, and then estimating the global topological graph (topo-graph) between these local clusters. We use the Wasserstein Distance between the predicted and prior class proportions to automatically cut noisy edges in the topo-graph and merge connected local clusters as final clusters.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Advanced Clustering Algorithms Research · Anomaly Detection Techniques and Applications
