Faster Algorithms for the Constrained k-means Problem

Anup Bhattacharya; Ragesh Jaiswal; Amit Kumar

arXiv:1504.02564·cs.DS·April 13, 2015

Faster Algorithms for the Constrained k-means Problem

Anup Bhattacharya, Ragesh Jaiswal, Amit Kumar

PDF

TL;DR

This paper introduces faster algorithms for a generalized k-means clustering problem where optimal clusters are arbitrary, providing bounds on the list size of centers needed for near-optimal solutions and improving computational efficiency.

Contribution

The paper presents a randomized algorithm with improved runtime and bounds for the list size of centers in a generalized k-means problem with arbitrary clusters.

Findings

01

Provides an upper bound of 2^{~O(k/ε)} on the list size of centers.

02

Establishes a lower bound of 2^{~Ω(k/√ε)} on the list size.

03

Algorithm runs in time O(n d 2^{~O(k/ε)}), improving previous results.

Abstract

The classical center based clustering problems such as $k$ -means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. Consider a variant of the $k$ -means problem that may be regarded as a general version of such problems. Here, the optimal clusters $O_{1}, ..., O_{k}$ are an arbitrary partition of the dataset and the goal is to output $k$ -centers $c_{1}, ..., c_{k}$ such that the objective function $\sum_{i = 1}^{k} \sum_{x \in O_{i}} ∣∣ x - c_{i} ∣ ∣^{2}$ is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of $k$ centers, will not behave well as far as optimizing the above objective function is concerned. However, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.