The Power of Uniform Sampling for $k$-Median

Lingxiao Huang; Shaofeng H.-C. Jiang; Jianing Lou

arXiv:2302.11339·cs.DS·February 23, 2023

The Power of Uniform Sampling for $k$-Median

Lingxiao Huang, Shaofeng H.-C. Jiang, Jianing Lou

PDF

Open Access 1 Video

TL;DR

This paper investigates the effectiveness of uniform sampling for approximating the $k$-Median problem across different metric spaces, establishing theoretical bounds and validating practical performance through experiments.

Contribution

It provides tight bounds on query complexity related to dataset balancedness and demonstrates that simple uniform sampling can achieve near-optimal approximation.

Findings

01

Uniform sampling is nearly optimal for $k$-Median approximation.

02

Query complexity depends inversely on dataset balancedness $eta$.

03

Experiments show uniform sampling performs well on real datasets.

Abstract

We study the power of uniform sampling for $k$ -Median in various metric spaces. We relate the query complexity for approximating $k$ -Median, to a key parameter of the dataset, called the balancedness $β \in (0, 1]$ (with $1$ being perfectly balanced). We show that any algorithm must make $Ω (1/ β)$ queries to the point set in order to achieve $O (1)$ -approximation for $k$ -Median. This particularly implies existing constructions of coresets, a popular data reduction technique, cannot be query-efficient. On the other hand, we show a simple uniform sample of $poly (k ϵ^{- 1} β^{- 1})$ points suffices for $(1 + ϵ)$ -approximation for $k$ -Median for various metric spaces, which nearly matches the lower bound. We conduct experiments to verify that in many real datasets, the balancedness parameter is usually well bounded, and that the uniform sampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Power of Uniform Sampling for k-Median· slideslive

Taxonomy

TopicsComplexity and Algorithms in Graphs · Data Management and Algorithms · Privacy-Preserving Technologies in Data