Eigenvalue-based Incremental Spectral Clustering

Mieczys{\l}aw A. K{\l}opotek; Bart{\l}miej Starosta; S{\l}awomir; T. Wierzcho\'n

arXiv:2308.10999·cs.LG·August 23, 2023

Eigenvalue-based Incremental Spectral Clustering

Mieczys{\l}aw A. K{\l}opotek, Bart{\l}miej Starosta, S{\l}awomir, T. Wierzcho\'n

PDF

Open Access

TL;DR

This paper introduces an incremental spectral clustering method that clusters manageable data subsets and merges them based on eigenvalue spectrum similarity, enabling scalable clustering of large datasets.

Contribution

The paper presents a novel incremental spectral clustering approach that efficiently handles large datasets by splitting, clustering, and merging based on eigenvalue spectra.

Findings

01

Clusters of subsets closely match full dataset clustering results

02

Method reduces computational complexity for large data

03

Effective for spectral clustering with large data samples

Abstract

Our previous experiments demonstrated that subsets collections of (short) documents (with several hundred entries) share a common normalized in some way eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets, (3) merge clusters from different subsets based on the eigenvalue spectrum similarity to form clusters of the entire set. This method can be especially useful for clustering methods of complexity strongly increasing with the size of the data sample,like in case of typical spectral clustering. Experiments were performed showing that in fact the clustering and merging the subsets yields clusters close to clustering the entire dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Topological and Geometric Data Analysis