Faster Projective Clustering Approximation of Big Data

Adiel Statman; Liat Rozenberg; Dan Feldman

arXiv:2011.13476·cs.DS·November 30, 2020

Faster Projective Clustering Approximation of Big Data

Adiel Statman, Liat Rozenberg, Dan Feldman

PDF

Open Access

TL;DR

This paper introduces a faster approximation method for projective clustering in big data, significantly reducing coreset size and computation time while handling outliers, with theoretical guarantees and experimental validation.

Contribution

It presents the first $O( ext{log}(m))$ approximation algorithm for $m$ lines clustering, reducing coreset size from exponential to logarithmic in $m$, and extends to outlier handling.

Findings

01

Achieves $O( ext{log}(m))$ approximation in $O(ndm)$ time.

02

Provides a coreset construction for projective clustering.

03

Includes experimental results and open-source implementation.

Abstract

In projective clustering we are given a set of n points in $R^{d}$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^{d}$ according to some given distance function. An $\eps$ -coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S$ the sum of these distances is approximated up to a factor of $(1 + \eps)$ . We suggest to reduce the size of existing coresets by suggesting the first $O (lo g (m))$ approximation for the case of $m$ lines clustering in $O (n d m)$ time, compared to the existing $exp (m)$ solution. We then project the points on these lines and prove that for a sufficiently large $m$ we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Face and Expression Recognition