Linear-Time Approximation Scheme for k-Means Clustering of Affine   Subspaces

Kyungjin Cho; Eunjin Oh

arXiv:2106.14176·cs.CG·June 29, 2021

Linear-Time Approximation Scheme for k-Means Clustering of Affine Subspaces

Kyungjin Cho, Eunjin Oh

PDF

TL;DR

This paper introduces a linear-time approximation scheme for k-means clustering of incomplete data represented as affine subspaces, significantly improving previous algorithms' efficiency.

Contribution

The paper presents a novel linear-time algorithm for k-means clustering of affine subspaces, reducing complexity from quadratic to linear in data size.

Findings

01

Achieves (1+ε)-approximate solutions in O(nd) time.

02

Constants depend only on Δ, ε, and k.

03

Improves previous O(n^2 d) algorithm by a factor of n.

Abstract

In this paper, we present a linear-time approximation scheme for $k$ -means clustering of \emph{incomplete} data points in $d$ -dimensional Euclidean space. An \emph{incomplete} data point with $Δ > 0$ unspecified entries is represented as an axis-parallel affine subspaces of dimension $Δ$ . The distance between two incomplete data points is defined as the Euclidean distance between two closest points in the axis-parallel affine subspaces corresponding to the data points. We present an algorithm for $k$ -means clustering of axis-parallel affine subspaces of dimension $Δ$ that yields an $(1 + ϵ)$ -approximate solution in $O (n d)$ time. The constants hidden behind $O (\cdot)$ depend only on $Δ, ϵ$ and $k$ . This improves the $O (n^{2} d)$ -time algorithm by Eiben et al.[SODA'21] by a factor of $n$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.