EPTAS for $k$-means Clustering of Affine Subspaces
Eduard Eiben, Fedor V. Fomin, Petr A. Golovach, William Lochet, Fahad, Panolan, Kirill Simonov

TL;DR
This paper introduces an Efficient Polynomial-Time Approximation Scheme (EPTAS) for clustering data points with missing entries by representing them as affine subspaces, extending traditional k-means to incomplete data.
Contribution
It develops an EPTAS for k-means clustering of incomplete data modeled as affine subspaces, generalizing standard clustering to handle missing entries efficiently.
Findings
Provides an algorithm with approximation guarantees for incomplete data clustering.
Achieves a running time polynomial in data size and dimension, with exponential dependence on parameters.
Extends classical k-means to a broader class of data representations.
Abstract
We consider a generalization of the fundamental -means clustering for data with incomplete or corrupted entries. When data objects are represented by points in , a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most unspecified entries corresponds to an axis-parallel affine subspace of dimension at most , called a -point. Thus we seek a partition of input -points into clusters minimizing the -means objective. For , when all coordinates of each point are specified, this is the usual -means clustering. We give an algorithm that finds an -approximate solution in time for some function of , and only.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Computational Geometry and Mesh Generation
