Large Scale Spectral Clustering Using Approximate Commute Time Embedding

Nguyen Lu Dang Khoa; Sanjay Chawla

arXiv:1111.4541·cs.LG·July 2, 2013

Large Scale Spectral Clustering Using Approximate Commute Time Embedding

Nguyen Lu Dang Khoa, Sanjay Chawla

PDF

TL;DR

This paper introduces a fast, accurate spectral clustering method that uses approximate commute time embedding with random projection and linear solvers, avoiding eigen decomposition and sampling, suitable for large datasets.

Contribution

It presents a novel spectral clustering approach that bypasses eigen decomposition and sampling, improving speed and accuracy for large-scale data.

Findings

01

Outperforms existing approximate methods in clustering quality

02

Faster computation on synthetic and real datasets

03

Does not require eigenvector calculation or sampling

Abstract

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to $O (n^{3})$ and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.