Spectral Clustering with Imbalanced Data
Jing Qian, Venkatesh Saligrama

TL;DR
This paper introduces a novel spectral clustering method that effectively handles imbalanced data by optimizing minimum cut partitions with size constraints, outperforming traditional approaches.
Contribution
It proposes a new graph partitioning framework that adaptively modulates node degrees to better detect imbalanced clusters, supported by theoretical analysis and experimental validation.
Findings
Outperforms traditional spectral clustering on imbalanced datasets
Provides theoretical justification for the adaptive graph construction approach
Demonstrates effectiveness on synthetic and real-world data
Abstract
Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced data. Our approach parameterizes a family of graphs, by adaptively modulating node degrees on a fixed node set, to yield a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach. We demonstrate the superiority of our method through unsupervised and semi-supervised experiments on synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Face and Expression Recognition
