Optimal Time Bounds for Approximate Clustering

Ramgopal Mettu; Greg Plaxton

arXiv:1301.0587·cs.DS·January 7, 2013·1 cites

Optimal Time Bounds for Approximate Clustering

Ramgopal Mettu, Greg Plaxton

PDF

Open Access

TL;DR

This paper introduces a new sampling technique called successive sampling for the k-median clustering problem, achieving a tight time complexity of Theta(nk) and providing constant-factor approximation guarantees.

Contribution

The paper presents a simple, efficient sampling method and an algorithm that tightly bounds the time complexity for approximate k-median clustering.

Findings

01

Successive sampling identifies small representative sets efficiently.

02

The algorithm runs in O(nk) time for a wide range of k values.

03

Established a tight lower bound matching the upper bound for randomized algorithms.

Abstract

Clustering is a fundamental problem in unsupervised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect the emph{k-median} objective function, a natural formulation of clustering in which we attempt to minimize the average distance to cluster centers. One of the main contributions of this paper is a simple but powerful sampling technique that we call emph{successive sampling} that could be of independent interest. We show that our sampling procedure can rapidly identify a small set of points (of size just O(klog{n/k})) that summarize the input points for the purpose of clustering. Using successive sampling, we develop an algorithm for the k-median problem that runs in O(nk) time for a wide range of values of k and is guaranteed, with high probability, to return a solution with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Bayesian Methods and Mixture Models