EPTAS for $k$-means Clustering of Affine Subspaces

Eduard Eiben; Fedor V. Fomin; Petr A. Golovach; William Lochet; Fahad; Panolan; Kirill Simonov

arXiv:2010.09580·cs.DS·October 20, 2020

EPTAS for $k$-means Clustering of Affine Subspaces

Eduard Eiben, Fedor V. Fomin, Petr A. Golovach, William Lochet, Fahad, Panolan, Kirill Simonov

PDF

Open Access

TL;DR

This paper introduces an Efficient Polynomial-Time Approximation Scheme (EPTAS) for clustering data points with missing entries by representing them as affine subspaces, extending traditional k-means to incomplete data.

Contribution

It develops an EPTAS for k-means clustering of incomplete data modeled as affine subspaces, generalizing standard clustering to handle missing entries efficiently.

Findings

01

Provides an algorithm with approximation guarantees for incomplete data clustering.

02

Achieves a running time polynomial in data size and dimension, with exponential dependence on parameters.

03

Extends classical k-means to a broader class of data representations.

Abstract

We consider a generalization of the fundamental $k$ -means clustering for data with incomplete or corrupted entries. When data objects are represented by points in $R^{d}$ , a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most $Δ$ unspecified entries corresponds to an axis-parallel affine subspace of dimension at most $Δ$ , called a $Δ$ -point. Thus we seek a partition of $n$ input $Δ$ -points into $k$ clusters minimizing the $k$ -means objective. For $Δ = 0$ , when all coordinates of each point are specified, this is the usual $k$ -means clustering. We give an algorithm that finds an $(1 + ϵ)$ -approximate solution in time $f (k, ϵ, Δ) \cdot n^{2} \cdot d$ for some function $f$ of $k, ϵ$ , and $Δ$ only.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Computational Geometry and Mesh Generation