A Randomized Approach to Efficient Kernel Clustering

Farhad Pourkamali-Anaraki; Stephen Becker

arXiv:1608.07597·stat.ML·December 5, 2016·GlobalSIP

A Randomized Approach to Efficient Kernel Clustering

Farhad Pourkamali-Anaraki, Stephen Becker

PDF

TL;DR

This paper introduces a randomized kernel approximation method for efficient kernel clustering that significantly reduces memory usage while maintaining accuracy, making kernel-based clustering more scalable for large datasets.

Contribution

The paper provides a new analysis and a specific one-pass randomized kernel approximation method that requires less memory than traditional approaches.

Findings

01

The proposed method is accurate in clustering tasks.

02

It requires drastically less memory than standard kernel K-means.

03

It outperforms Nystrom-based approximations in memory efficiency.

Abstract

Kernel-based K-means clustering has gained popularity due to its simplicity and the power of its implicit non-linear representation of the data. A dominant concern is the memory requirement since memory scales as the square of the number of data points. We provide a new analysis of a class of approximate kernel methods that have more modest memory requirements, and propose a specific one-pass randomized kernel approximation followed by standard K-means on the transformed data. The analysis and experiments suggest the method is accurate, while requiring drastically less memory than standard kernel K-means and significantly less memory than Nystrom based approximations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering