On Coresets for Clustering in Small Dimensional Euclidean Spaces

Lingxiao Huang; Ruiyuan Huang; Zengfeng Huang; Xuan Wu

arXiv:2302.13737·cs.DS·February 28, 2023

On Coresets for Clustering in Small Dimensional Euclidean Spaces

Lingxiao Huang, Ruiyuan Huang, Zengfeng Huang, Xuan Wu

PDF

Open Access 1 Video

TL;DR

This paper studies small coresets for k-Median clustering in low-dimensional Euclidean spaces, providing improved bounds, new lower bounds, and the first separation results between k=1 and k=2 in 1D.

Contribution

It offers improved coreset size bounds for small dimensions, establishes new lower bounds, and demonstrates a novel separation between 1-Median and 2-Median in 1D.

Findings

01

Improved coreset bounds for small dimensions.

02

New lower bounds for coreset sizes.

03

First known separation between 1-Median and 2-Median in 1D.

Abstract

We consider the problem of constructing small coresets for $k$ -Median in Euclidean spaces. Given a large set of data points $P \subset R^{d}$ , a coreset is a much smaller set $S \subset R^{d}$ , so that the $k$ -Median costs of any $k$ centers w.r.t. $P$ and $S$ are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small $d$ is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for $k$ -Median in small dimensions. For small $d$ , a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters.…

Tables1

Table 1. Table 1: Comparison of coreset sizes for k 𝑘 k -Median in ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} . We use following abbreviations: [1] for ( Har-Peled and Kushal , 2005 ) , [2] for ( Feldman and Langberg , 2011 ) , [3] for ( Baker et al. , 2020 ) , [4] for ( Cohen-Addad et al. , 2021 ) , [5] for ( Cohen-Addad et al. , 2022 ) and [6] for ( Huang et al. , 2022b ) . The symbol † † \dagger represents that the results can be generalized to ( k , z ) 𝑘 𝑧 (k,z) -Clustering ( Definition 3.1 ).

Paremeters $d, k$		Best Known Upper Bound	Best Known Lower Bound	Our Results
$d = 1$	$k = 1$	$\tilde{O} (ε^{- 1})$ [1]	$Ω (ε^{- 1 / 2})$ [3]	$\tilde{O} (ε^{- 1 / 2})$ (Thm. 2.1)
$d = 1$	$k > 1$	$O (k ε^{- 1})$ [1]	$Ω (k ε^{- 1 / 2})$ [3]	$Ω (k ε^{- 1})$ (Thm. 2.9)
$1 < d < Θ (ε^{- 2})$	$k = 1$	$\tilde{O} (ε^{- 2})$ [4]	$Ω (ε^{- 1 / 2})$ [3]	$\tilde{O} {(\sqrt{d} ε^{- 1})}^{†}$ (Thm. 3.2)
$1 < d < Θ (ε^{- 2})$	$k > 1$	$\tilde{O} (\min {\frac{k d}{ε^{2}}, \frac{k}{ε^{3}}, \frac{k^{4 / 3}}{ε^{2}}})$ [2,5, 6]	$Ω (k ε^{- 1 / 2})$ [3]	$Ω {(k d + k ε^{- 1})}^{†}$ (Thm. 3.8)
$d = Ω (ε^{- 2})$	$k \geq 1$	$\tilde{O} (\min {\frac{k}{ε^{3}}, \frac{k^{4 / 3}}{ε^{2}}})$ [5, 6]	$Ω (k ε^{- 2})$ [5]	$/$

Equations251

cost (P, C) := p \in P \sum d (p, C) = p \in P \sum c \in C min d (p, c),

cost (P, C) := p \in P \sum d (p, C) = p \in P \sum c \in C min d (p, c),

\forall C \in C, p \in S \sum w (p) \cdot d (p, C) \in (1 \pm ε) \cdot cost (P, C) .

\forall C \in C, p \in S \sum w (p) \cdot d (p, C) \in (1 \pm ε) \cdot cost (P, C) .

f_{P} (p_{L})

f_{P} (p_{L})

f_{P} (p_{L}) \leq f_{P} (c^{⋆}) + n \cdot \frac{f _{P} ( c ^{⋆} )}{ε n} = (1 + ε^{- 1}) \cdot f_{P} (c^{⋆}) .

f_{P} (p_{L}) \leq f_{P} (c^{⋆}) + n \cdot \frac{f _{P} ( c ^{⋆} )}{ε n} = (1 + ε^{- 1}) \cdot f_{P} (c^{⋆}) .

4 \cdot 2^{i} \cdot OPT

4 \cdot 2^{i} \cdot OPT

f_{P} (c) - f_{S} (c) = B \sum cost (B, c) - N (B) \cdot d (μ (B), c)

f_{P} (c) - f_{S} (c) = B \sum cost (B, c) - N (B) \cdot d (μ (B), c)

∣ f_{P} (c) - f_{S} (c) ∣ \leq δ (B) \leq ε \cdot 2^{i} \cdot OPT .

∣ f_{P} (c) - f_{S} (c) ∣ \leq δ (B) \leq ε \cdot 2^{i} \cdot OPT .

r (b) - r (\frac{a + b}{2})

r (b) - r (\frac{a + b}{2})

= \int_{\frac{a + b}{2}}^{b} r^{'} (\frac{a + b}{2}) + \int_{\frac{a + b}{2}}^{c} r^{''} (t) dt dc

= \frac{L}{2} r^{'} (\frac{a + b}{2}) + \int_{\frac{a + b}{2}}^{b} \int_{\frac{a + b}{2}}^{c} r^{''} (t) dt dc

\geq - \frac{L}{2} \frac{4}{L} ∥ r ∥_{\infty} + \frac{1}{8} (b - a)^{2} r^{''} (c)

\geq - 2 ε α + \frac{1}{8} β .

cost (p, {0, c}) = ⎩ ⎨ ⎧ w (p) p w (p) (c - p) w (p) (p - c) for p \in [0, mid], for p \in [mid, c], for p \in [c, + \infty) .

cost (p, {0, c}) = ⎩ ⎨ ⎧ w (p) p w (p) (c - p) w (p) (p - c) for p \in [0, mid], for p \in [mid, c], for p \in [c, + \infty) .

cost (p, {0, c + dc}) - cost (p, {0, c})

cost (p, {0, c + dc}) - cost (p, {0, c})

=

f_{P} (c + dc) - f_{P} (c)

f_{P} (c + dc) - f_{P} (c)

=

f_{P}^{'} (c) = p \in P \cap [mid, c] \sum w (p) - p \in P \cap [c, + \infty) \sum w (p) .

f_{P}^{'} (c) = p \in P \cap [mid, c] \sum w (p) - p \in P \cap [c, + \infty) \sum w (p) .

0 \leq f_{P} (c)

0 \leq f_{P} (c)

\leq i = 1 \sum \frac{1}{ε} λ (I_{i}) r_{i} \leq i = 1 \sum \frac{1}{ε} (\frac{1}{4})^{i - 1} \times 2 \times 4^{i - 1}

= \frac{2}{ε} .

cost (p, {0, c}) = ⎩ ⎨ ⎧ p c - p p - c for p \in [0, mid], for p \in [mid, c], for p \in [c, + \infty) .

cost (p, {0, c}) = ⎩ ⎨ ⎧ p c - p p - c for p \in [0, mid], for p \in [mid, c], for p \in [c, + \infty) .

cost (p, {0, c + dc}) - cost (p, {0, c})]

cost (p, {0, c + dc}) - cost (p, {0, c})]

=

f_{P} (c + dc) - f_{P} (c)

f_{P} (c + dc) - f_{P} (c)

=

=

+ \int_{c}^{c + dc} O (dc) d λ + \int_{c + dc}^{+ \infty} - dc d λ

=

f_{P}^{''} (c)

f_{P}^{''} (c)

= 2 μ (c) - \frac{1}{2} μ (\frac{c}{2}) .

cost_{z} (P, C) := p \in P \sum d^{z} (p, C) = p \in P \sum c \in C min d^{z} (p, c),

cost_{z} (P, C) := p \in P \sum d^{z} (p, C) = p \in P \sum c \in C min d^{z} (p, c),

\forall C \in C, p \in S \sum w (p) \cdot d^{z} (p, C) \in (1 \pm ε) \cdot cost_{z} (P, C) .

\forall C \in C, p \in S \sum w (p) \cdot d^{z} (p, C) \in (1 \pm ε) \cdot cost_{z} (P, C) .

p \in S \sum w (p) \cdot d^{z} (p, c) \in cost_{z} (P, c) \pm ε max {1, ∥ c ∥_{2}}^{z} \cdot ∣ P ∣.

p \in S \sum w (p) \cdot d^{z} (p, c) \in cost_{z} (P, c) \pm ε max {1, ∥ c ∥_{2}}^{z} \cdot ∣ P ∣.

p \in S \sum w (p) \cdot d^{z} (p, c) \in cost_{z} (P, c) \pm ε max {1, ∥ c ∥_{2}}^{z} \cdot ∣ P ∣.

p \in S \sum w (p) \cdot d^{z} (p, c) \in cost_{z} (P, c) \pm ε max {1, ∥ c ∥_{2}}^{z} \cdot ∣ P ∣.

p \in S \sum w (p) \cdot d^{z} (p, c) \in (1 \pm ε) \cdot cost_{z} (P, c) \pm ε ∣ P ∣.

p \in S \sum w (p) \cdot d^{z} (p, c) \in (1 \pm ε) \cdot cost_{z} (P, c) \pm ε ∣ P ∣.

D_{P}^{(C)} (f) :=

D_{P}^{(C)} (f) :=

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On Coresets for Clustering in Small Dimensional Euclidean spaces· slideslive

Taxonomy

TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research · Sparse and Compressive Sensing Techniques

Full text

On Coresets for Clustering in Small Dimensional Euclidean Spaces

Lingxiao Huang111 State Key Laboratory of Novel Software Technology, Nanjing University; Email: [email protected]

Ruiyuan Huang222Fudan University; Email: [email protected]

Zengfeng Huang333Fudan University; Email: [email protected]

Xuan Wu444Huawei TCS Lab; Email: [email protected]

Abstract

We consider the problem of constructing small coresets for $k$ -Median in Euclidean spaces. Given a large set of data points $P\subset\mathbb{R}^{d}$ , a coreset is a much smaller set $S\subset\mathbb{R}^{d}$ , so that the $k$ -Median costs of any $k$ centers w.r.t. $P$ and $S$ are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small $d$ is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for $k$ -Median in small dimensions. For small $d$ , a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small $d$ . In particular, we completely settle the coreset size bound for $1$ -d $k$ -Median (up to log factors). Interestingly, our results imply a strong separation between $1$ -d $1$ -Median and $1$ -d $2$ -Median. As far as we know, this is the first such separation between $k=1$ and $k=2$ in any dimension.

1 Introduction
1.1 Problem Definitions and Previous Results
1.2 Our Results
1.3 Technical Overview
1.4 Other Related Work
2 Tight Coreset Sizes for $1$ -d $k$ -Median
2.1 Near Optimal Coreset for $1$ -d $1$ -Median
2.2 Tight Lower Bound on Coreset Size for $1$ -d $k$ -Median when $k\geq 2$
3 Improve Coreset Sizes when $2\leq d\leq\varepsilon^{-2}$
3.1 Improved Coreset Size in $\mathbb{R}^{d}$ when $k=1$
3.1.1 Useful Notations and Facts
3.1.2 Proof of Theorem 3.2
3.2 Improved Coreset Lower Bound in $\mathbb{R}^{d}$ when $k\geq 2$
3.2.1 Preparation
3.2.2 Proof of Theorem 3.8 when $z=2$
4 Conclusion
A Coreset Lower Bound for General $k$ -Median in $\mathbb{R}$
B Proof of Theorem 3.8 for General $z\geq 1$

1 Introduction

Processing huge datasets is always computationally challenging. In this paper, we consider the coreset paradigm, which is an effective data-reduction tool to alleviate the computation burden on big data. Roughly speaking, given a large dataset, the goal is to construct a much smaller dataset, called coreset, so that vital properties of the original dataset are preserved. Coresets for various problems have been extensively studied (Har-Peled and Mazumdar, 2004; Feldman and Langberg, 2011; Feldman et al., 2013; Cohen-Addad et al., 2022; Braverman et al., 2022). In this paper, we investigate coreset construction for $k$ -Median in Euclidean spaces.

Coreset construction for Euclidean $k$ -Median has been studied for nearly two decades (Har-Peled and Mazumdar, 2004; Feldman and Langberg, 2011; Huang et al., 2018; Cohen-Addad et al., 2021, 2022). For this particular problem, an $\varepsilon$ -coreset is a (weighted) point set in the same Euclidean space that satisfies: given any set of $k$ centers, the $k$ -Median costs of the centers w.r.t. the original point set and the coreset are within a factor of $1+\varepsilon$ . The most important task in theoretical research here is to characterize the minimum size of $\varepsilon$ -coresets. Recently, there has been great progress in closing the gap between upper and lower bounds in high-dimensional spaces. However, researches on the coreset size in small dimensional spaces are rare. There are still large gaps between upper and lower bounds even for $1$ -d $1$ -Median.

Clustering in small dimensional Euclidean spaces is of both theoretical and practical importance. In practice, many applications involve clustering points in small dimensional spaces. A typical example is clustering objects in $\mathbb{R}^{2}$ or $\mathbb{R}^{3}$ based on their spatial coordinates (Wheeler, 2007; Fonseca-Rodríguez et al., 2021). Another example is spectral clustering for graph and social network analysis (Von Luxburg, 2007; Kunegis et al., 2010; Zhang et al., 2014; Narantsatsralt and Kang, 2017). In spectral clustering, nodes are first embedded into a small dimensional Euclidean space using spectral methods and then Euclidean clustering algorithms are applied in the embedding space. Even the simplest $1$ -d $k$ -Median has numerous practical applications (Arnaboldi et al., 2012; Jeske et al., 2013; Pennacchioli et al., 2014).

On the theory side, existing techniques for coresets in high dimensions may not be sufficient to obtain optimal coresets in small dimensions. For example, much smaller size is achievable in $\mathbb{R}^{1}$ by using geometric methods, while the sampling methods for strong coresets in high dimension (Langberg and Schulman, 2010; Cohen-Addad et al., 2021; Huang et al., 2022b) seem not viable to obtain such bounds in low dimensions. This suggests that optimal coreset construction in small dimensions may require new techniques, which provides a partial explanation of why $1$ -d $1$ -Median is still open after two decades of research. That being said, the coreset problem for clustering in small dimensional spaces is of great theoretical interest and practical value. Yet it is largely unexplored in the literature. This paper aims to fill the gap and study the following question:

Question 1.

What is the tight coreset size for Euclidean $k$ -Median problem in $\mathbb{R}^{d}$ for small $d$ ?

1.1 Problem Definitions and Previous Results

Euclidean $k$ -Median.

In the Euclidean $k$ -Median problem, we are given a dataset $P\subset\mathbb{R}^{d}$ ( $d\geq 1$ ) of $n$ points and an integer $k\geq 1$ ; and the goal is to find a $k$ -center set $C\subset\mathbb{R}^{d}$ that minimizes the objective function

[TABLE]

where $d(p,c)$ represents the Euclidean distance between $p$ and $c$ . It has many application domains including approximation algorithms, unsupervised learning, and computational geometry (Lloyd, 1982; Tan et al., 2006; Arthur and Vassilvitskii, 2007; Coates and Ng, 2012).

Coresets.

Let $\mathcal{C}$ denote the collection of all $k$ -center sets, i.e., $\mathcal{C}:=\{C\subset\mathbb{R}^{d}~{}:~{}|C|=k\}$ .

Definition 1.1 ( $\varepsilon$ -Coreset for Euclidean $k$ -Median (Har-Peled and Mazumdar, 2004)).

Given a dataset $P\subset\mathbb{R}^{d}$ of $n$ points, an integer $k\geq 1$ and $\varepsilon\in(0,1)$ , an $\varepsilon$ -coreset for Euclidean $k$ -Median is a subset $S\subseteq P$ with weight $w:S\to\mathbb{R}_{\geq 0}$ , such that

[TABLE]

For Euclidean $k$ -Median, the best known upper bound on $\varepsilon$ -coreset size is $\tilde{O}(\min\left\{\frac{k^{4/3}}{\varepsilon^{2}},\frac{k}{\varepsilon^{3}}\right\})$ (Huang et al., 2022b; Cohen-Addad et al., 2022) and $\Omega(\frac{k}{\varepsilon^{2}})$ is the best existing lower bound (Cohen-Addad et al., 2022). The upper bound is dimension-independent, since using dimensionality reduction techniques such as Johnson–Lindenstrauss transform, the dimension can be reduced to $\tilde{\Theta}(\frac{1}{\varepsilon^{2}})$ . Thus, most previous work essentially only focus on $d=\tilde{\Theta}(\frac{1}{\varepsilon^{2}})$ , whereas the case for $d<\frac{1}{\varepsilon^{2}}$ is largely unexplored. The lower bound requires $d=\Omega(\frac{k}{\varepsilon^{2}})$ , as the hard instance for the lower bound is an orthonormal basis of size $\Omega(\frac{k}{\varepsilon^{2}})$ . For constant $k$ and large enough $d$ , the upper and lower bounds match up to a polylog factor.

On the contrary, for $d\ll\Theta(\frac{1}{\varepsilon^{2}})$ , tight coreset sizes for $k$ -Median are far from well-understood, even when $k=1$ . Specifically, for constant $d$ , the current best upper bound is $\tilde{O}(\frac{k}{\varepsilon^{3}},\frac{kd}{\varepsilon^{2}})$ (Feldman and Langberg, 2011), and the best lower bound is $\Omega(\frac{k}{\sqrt{\varepsilon}})$ (Baker et al., 2020). Thus, there is a still large gap between the upper and lower bounds for small $d$ . Perhaps surprisingly, this is the case even for $d=1$ : Har-Peled and Kushal (2005) present a coreset of size $\tilde{O}(\frac{k}{\varepsilon})$ in $\mathbb{R}$ while the best known lower bound is $\Omega(\frac{k}{\sqrt{\varepsilon}})$ .

1.2 Our Results

We provide a complete characterization of the coreset size (up to a logarithm factor) for $d=1$ and partially answer 1 for $1<d<\Theta(\frac{1}{\varepsilon^{2}})$ . Our results are summarized in Table 1.

For $d=1$ , we construct coresets with size $\tilde{O}(\frac{1}{\sqrt{\varepsilon}})$ for $1$ -Median (Theorem 2.1) and prove that the coreset size lower bound is $\Omega(\frac{k}{\varepsilon})$ for $k\geq 2$ (Theorem 2.9). Previous work has shown coresets with size $\tilde{O}(\frac{k}{\varepsilon})$ exist for $k$ -Median (Har-Peled and Kushal, 2005) in $1$ -d, and thus our lower bound nearly matches this upper bound. On the other hand, it was proved that the coreset size of $1$ -Median in $1$ -d is $\Omega(\frac{1}{\sqrt{\varepsilon}})$ (Baker et al., 2020), which shows our upper bound result for $1$ -Median is nearly tight.

For $d>1$ , we provide a discrepancy-based method that constructs deterministic coresets of size $\tilde{O}(\frac{\sqrt{d}}{\varepsilon})$ for $1$ -Median (Theorem 3.2). Our result improves over the existing $\tilde{O}(\frac{1}{\varepsilon^{2}})$ upper bound (Cohen-Addad et al., 2021) for $1<d<\Theta(\frac{1}{\varepsilon^{2}})$ and matches the $\Omega(\frac{1}{\varepsilon^{2}})$ lower bound (Cohen-Addad et al., 2022) for $d=\Theta(\frac{1}{\varepsilon^{2}})$ . We further prove a lower bound of $\Omega(kd)$ for $k$ -Median in $\mathbb{R}^{d}$ (Theorem 3.8). Combining with our $1$ -d lower bound $\Omega(\frac{k}{\varepsilon})$ , this improves over the existing $\Omega(\frac{k}{\sqrt{\varepsilon}}+d)$ lower bound (Baker et al., 2020; Cohen-Addad et al., 2022).

1.3 Technical Overview

We first discuss the 1-d $k$ -Median problem and show that the framework of (Har-Peled and Kushal, 2005) is optimal with significant improvement for $k=1$ . Then we briefly summarize our approaches for $2\leq d\leq\varepsilon^{-2}$ .

The Bucket-Partitioning Framework for $1$ -d $k$ -Median in (Har-Peled and Kushal, 2005).

Our main results in $1$ -d are based on the classic bucket-partitioning framework, developed in (Har-Peled and Kushal, 2005), which we briefly review now. They greedily partition a dataset $P\subset\mathbb{R}$ into $O(k\varepsilon^{-1})$ consecutive buckets $B$ ’s and collect the mean point $\mu(B)$ together with weight $|B|$ as their coreset $S$ . Their construction requires that the cumulative error $\delta(B)=\sum_{p\in B}|p-\mu(B)|\leq\varepsilon\cdot\mathsf{OPT}/k$ holds for every bucket $B$ , where $\mathsf{OPT}$ is the optimal $k$ -Median cost of $P$ . Their important geometric observation is that the induced error $|\mathrm{cost}(B,C)-|B|\cdot d(\mu(B),C)|$ of every bucket $B$ is at most $\delta(B)$ , and even is 0 when all points in $B$ assign to the same center. Consequently, only $O(k)$ buckets induce a non-zero error for every center set $C$ and the total induced error is at most $\varepsilon\cdot\mathsf{OPT}$ , which concludes that $S$ is a coreset of size $O(k\varepsilon^{-1})$ .

Reducing the Number of Buckets for $1$ -d $1$ -Median via Adaptive Cumulative Errors.

In the case of $k=1$ where there is only one center $c\in\mathbb{R}$ , we improve the result in (Har-Peled and Kushal, 2005) (Theorem 2.1) through the following observation: $\mathrm{cost}(P,c)$ can be much larger than $\mathsf{OPT}$ when center $c$ is close to either of the endpoints of $P$ , and consequently, can allow a larger induced error of coreset than $\varepsilon\cdot\mathsf{OPT}$ . This observation motivates us to adaptively select cumulative errors for different buckets according to their locations. Inspired by this motivation, our algorithm (Algorithm 1) first partitions dataset $P$ into blocks $B_{i}$ according to clustering cost, i.e., $\mathrm{cost}(P,c)\approx 2^{i}\cdot\mathsf{OPT}$ for all $c\in B_{i}$ , and then further partition each block $B_{i}$ into buckets $B_{i,j}$ with a carefully selected cumulative error bound $\delta(B_{i,j})\leq\varepsilon\cdot 2^{i}\cdot\mathsf{OPT}$ . Intuitively, our selection of cumulative errors is proportional to the minimum clustering cost of buckets, which results in a coreset.

For the coreset size, we first observe that there are only $O(\log\varepsilon^{-1})$ non-empty blocks $B_{i}$ (Lemma 2.7) since we can “safely ignore” the leftmost and the rightmost $\varepsilon n$ points and the remaining points $p\in P$ satisfy $\mathrm{cost}(P,p)\leq\varepsilon^{-1}\mathsf{OPT}$ . The most technical part is that we show the number $m$ of buckets in each $B_{i}$ is at most $O(\varepsilon^{-1/2})$ (Lemma 2.8), which results in our improved coreset size $\tilde{O}(\varepsilon^{-1/2})$ . The basic idea is surprisingly simple: the clustering cost of a bucket is proportional to its distance to center $c$ , and hence, the clustering cost of $m$ consecutive buckets is proportional to $m^{2}$ instead of $m$ . According to this idea, we find that $m^{2}\cdot\delta(B_{i,j})\leq 2^{i}\cdot\mathsf{OPT}$ for every $B_{i}$ , which implies a desired bound $m=O(\varepsilon^{-1/2})$ by our selection of $\delta(B_{i,j})\approx\varepsilon\cdot 2^{i}\cdot\mathsf{OPT}$ .

Hardness Result for $1$ -d $2$ -Median: Cumulative Error is Unavoidable.

We take $k=2$ as an example here and show the tightness of the $O(\varepsilon^{-1})$ bound by (Har-Peled and Mazumdar, 2004). The extension to $k>2$ is standard via an idea of (Baker et al., 2020).

We construct the following worst-case instance $P\subset\mathbb{R}$ of size $\varepsilon^{-1}$ : We construct $m=\varepsilon^{-1}$ consecutive buckets $B_{1},B_{2},\ldots,B_{m}$ such that the length of buckets exponentially increases while the number of points in buckets exponentially decreases. We fix a center at the leftmost point of $P$ (assuming to be [math] w. l. o. g.) and move the other center $c$ along the axis. Such dataset $P$ satisfies the following:

•

the clustering cost is stable: for all $c$ , $f_{P}(c):=\mathrm{cost}(P,\left\{0,c\right\})\approx\varepsilon^{-1}$ up to a constant factor;

•

the cumulative error for every bucket $B_{i}$ is $\delta(B_{i})\approx 1$ ;

•

for every $B_{i}$ , $\mathrm{cost}(B_{i},\left\{0,c\right\})$ is a quadratic function that first decreases and then increases as $c$ moves from left to right within $B_{i}$ , and the gap between the maximum and the minimum values is $\Omega(\delta(B_{i}))$ .

Suppose $S\subseteq P$ is of size $o(\varepsilon^{-1})$ . Then there must exist a bucket $B$ such that $S\cap B=\emptyset$ . We find that function $f_{S}(c):=\mathrm{cost}(S,\left\{0,c\right\})$ is an affine linear function when $c$ is located within $B_{i}$ (Lemma 2.11). Consequently, the maximum induced error $\max_{c\in B_{i}}|f_{P}(c)-f_{S}(c)|$ is at least $\Omega(\delta(B_{i}))$ since the estimation error of an affine linear function $f_{S}$ to a quadratic function $f_{P}$ is up to certain “cumulative curvature” of $f_{P}$ (Lemma 2.10), which is $\Omega(\delta(B_{i}))$ due to our construction. Hence, $S$ is not a coreset since $f_{P}(c)\approx\varepsilon^{-1}$ always holds.

We remind the readers that the above cost function $f_{P}$ is actually a piecewise quadratic function with $O(\varepsilon^{-1})$ pieces instead of a quadratic one, which ensures the stability of $f_{P}$ . This is the main difference from $k=1$ , which leads to a gap of $\varepsilon^{-1/2}$ on the coreset size between $k=1$ and $k=2$ . As far as we know, this is the first such separation in any dimension.

Our Approaches when $2\leq d\leq\varepsilon^{-2}$ .

For $1$ -Median, our upper bound result (Theorem 3.2) combines a recent hierarchical decomposition coreset framework in (Braverman et al., 2022), that reduces the instance to a hierarchical ring structure (Theorem 3.4), and the discrepancy approaches (Theorem 3.6) developed by (Karnin and Liberty, 2019). The main idea is to extend the analytic analysis of (Karnin and Liberty, 2019) to handle multiplicative errors in a scalable way.

For $k$ -Median, our lower bound result (Theorem 3.8) extends recently developed approaches in (Cohen-Addad et al., 2022). Their hard instance is an orthonormal basis in $\mathbb{R}^{d}$ , the size of which is at most $d$ , and hence cannot obtain a lower bound higher than $\Omega(d)$ . We improve the results by embedding $\Theta(k)$ copies of their hard instance in $\mathbb{R}^{d}$ , each of which lies in a different affine subspace. We argue that the errors from all subspaces add up. However, the error analysis from (Cohen-Addad et al., 2022) cannot be directly used; we need to overcome several technical challenges. For instance, points in the coreset are not necessary in any affine subspace, so the error in each subspace is not a corollary of their result. Moreover, errors from different subspaces may cancel each other.

1.4 Other Related Work

Coresets for Clustering in Metric Spaces

Recent works (Cohen-Addad et al., 2022, 2022; Huang et al., 2023) show that Euclidean $(k,z)$ -Clustering admits $\varepsilon$ -coresets of size $\tilde{O}(k\varepsilon^{-2}\cdot\min\{\varepsilon^{-z},k^{\frac{z}{z+2}}\})$ and a nearly tight bound $\tilde{O}(\varepsilon^{-2})$ is known when $k=1$ (Cohen-Addad et al., 2021). Apart from the Euclidean metric, the research community also studies coresets for clustering in general metric spaces a lot. For example, Feldman and Langberg (2011) construct coresets of size $\tilde{O}(k\varepsilon^{-2}\log n)$ for general discrete metric. Baker et al. (2020) show that the previous $\log n$ factor is unavoidable. There are also works on other specific metrics spaces: doubling metrics (Huang et al., 2018) and graphs with shortest path metrics (Baker et al., 2020; Braverman et al., 2021; Cohen-Addad et al., 2021), to name a few.

Coresets for Variants of Clustering

Coresets for variants of clustering problems are also of great interest. For example, Braverman et al. (2022) construct coresets of size $\tilde{O}(k^{3}\varepsilon^{-6})$ for capacitated $k$ -Median, which is improved to $\tilde{O}(k^{3}\varepsilon^{-5})$ by (Huang et al., 2023). Other important variants of clustering include ordered clustering (Braverman et al., 2019), robust clustering (Huang et al., 2022a), and time-series clustering (Huang et al., 2021).

2 Tight Coreset Sizes for $1$ -d $k$ -Median

2.1 Near Optimal Coreset for $1$ -d $1$ -Median

We have the following theorem.

Theorem 2.1 (Improved Coreset for one-dimensional $1$ -Median).

There is a polynomial time algorithm, such that given an input data set $P\subset\mathbb{R}$ , it outputs an $\varepsilon$ -coreset of $P$ for $1$ -Median with size $\tilde{O}(\varepsilon^{-\frac{1}{2}})$ .

Useful Notations and Facts.

Throughout this section, we use $P=\{p_{1},\cdots,p_{n}\}\subset\mathbb{R}$ with $p_{1}<p_{2}<\cdots<p_{n}$ . Let $c^{\star}=p_{\lfloor\frac{n}{2}\rfloor}$ , we have the following simple observations for $\mathrm{cost}(P,c)$ .

Observation 2.2.

$\mathrm{cost}(P,c)$ * is a convex piecewise affine linear function of $c$ and $\mathsf{OPT}=\mathrm{cost}(P,c^{\star})$ is the optimal $1$ -Median cost on $P$ .*

The following notions, proposed by (Har-Peled and Mazumdar, 2004), are useful for our coreset construction.

Definition 2.3 (Bucket).

A bucket $B$ is a continuous subset $\{p_{l},p_{l+1}\dots,p_{r}\}$ of $P$ for some $1\leq l\leq r\leq n$ .

Definition 2.4 (**Mean and cumulative error

(Har-Peled and Kushal, 2005)).**

Given a bucket $B=\{p_{l},\dots,p_{r}\}$ for some $1\leq l\leq r\leq n$ , denote $N(B):=r-l+1$ to be the number of points within $B$ and $L(B):=p_{r}-p_{l}$ to be the length of $B$ . We define the mean of $B$ to be $\mu(B):=\frac{1}{N(B)}\sum_{p\in B}p,$ and define the cumulative error of $B$ to be $\delta(B):=\sum_{p\in B}|p-\mu(B)|.$

Note that $\mu(B)\in[p_{l},p_{r}]$ always holds, which implies the following fact.

Fact 2.5.

$\delta(B)\leq N(B)\cdot L(B)$ .

The following lemma shows that for each bucket $B$ , the coreset error on $B$ is no more than $\delta(B)$ .

Lemma 2.6 (Cumulative error controls coreset error (Har-Peled and Kushal, 2005)).

Let $B=\left\{p_{l},\ldots,p_{r}\right\}\subseteq P$ for $1\leq l\leq r\leq n$ be a bucket and $c\in\mathbb{R}$ be a center. We have

if $c\in(p_{l},p_{r})$ , $|\mathrm{cost}(B,c)-N(B)d(\mu(B),c)|\leq\delta(B)$ ; 2. 2.

if $c\notin(p_{l},p_{r})$ , $|\mathrm{cost}(B,c)-N(B)d(\mu(B),c)|=0$ .

Algorithm for Theorem 2.1.

Our algorithm is summarized in Algorithm 1. We improve the framework in (Har-Peled and Kushal, 2005), which partitions $P$ into multiple buckets so that the cumulative errors in different buckets are the same and collects their means as a coreset. Our main idea is to carefully select an adaptive cumulative error for different buckets. In Lines 2-3, we take the leftmost $\varepsilon n$ points and the rightmost $\varepsilon n$ points, and add their weighted means to our coreset $S$ . In Lines 4 (and 7), we divide the remaining points into disjoint blocks $B_{i}$ ( $B^{\prime}_{i}$ ) such that for every $p\in B_{i}$ , $\mathrm{cost}(P,p)\approx 2^{i}\cdot\mathsf{OPT}$ , and then greedily divide each $B_{i}$ into disjoint buckets $B_{i,j}$ with a cumulative error roughly $\varepsilon\cdot 2^{i}\cdot\mathsf{OPT}$ in Line 5. We remind the readers that the cumulative error in (Har-Peled and Kushal, 2005) is always $\varepsilon\cdot\mathsf{OPT}$ .

We define function $f_{P}:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}$ such that $f_{P}(c)=\mathrm{cost}(P,c)$ for every $c\in\mathbb{R}$ and define $f_{S}:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}$ such that $f_{S}(c)=\mathrm{cost}(S,c)$ for every $c\in\mathbb{R}$ . By Observation 2.2, $f_{P}(c)$ is decreasing on $(-\infty,c^{*}]$ and increasing on $[c^{*},\infty)$ . As a result, each $B_{i}(B^{\prime}_{i})$ consists of consecutive points in $P$ . The following lemma shows that the number of blocks $B_{i}$ ( $B^{\prime}_{i}$ ) is $O(\log\frac{1}{\varepsilon})$ .

Lemma 2.7 (Number of blocks).

There are at most $O(\log(\frac{1}{\varepsilon}))$ non-empty blocks $B_{i}$ or $B^{\prime}_{i}$ .

Proof:

We prove Algorithm 1 divides $\{p_{L+1},\dots,p_{\lfloor\frac{n}{2}\rfloor}\}$ into at most $O(\log(\frac{1}{\varepsilon}))$ non-empty blocks $B_{i}$ . Argument for $\{p_{\lfloor\frac{n}{2}\rfloor+1},\dots,p_{R}\}$ is entirely symmetric.

If $B_{i}$ is non-empty for some $i\geq 0$ , we must have $f_{P}(p)\geq 2^{i}\cdot\mathsf{OPT}$ for $p\in B_{i}$ . We also have $p>p_{L}$ since $p\in B_{i}\subset\{p_{L+1},\dots,p_{\lfloor\frac{n}{2}\rfloor}\}$ . Since $f_{P}$ is convex, we have $2^{i}\cdot\mathsf{OPT}\leq f_{P}(p)\leq f_{P}(p_{L})$ . If we show that $f_{P}(p_{L})\leq(1+\varepsilon^{-1})\cdot\mathsf{OPT}=(1+\varepsilon^{-1})\cdot f_{P}(c^{\star})$ then we have $2^{i}\leq(1+\varepsilon^{-1})$ thus $i\leq O(\log(\frac{1}{\varepsilon}))$ .

To prove $f_{P}(p_{L})\leq(1+\varepsilon^{-1})\cdot f_{P}(c^{\star})$ , we use triangle inequality to obtain that

[TABLE]

Moreover, we note that by the choice of $p_{L}$ , $|c^{\star}-p_{L}|\leq\frac{1}{L}\cdot\sum_{i=1}^{L}|c^{\star}-p_{i}|\leq\frac{f_{P}(c^{\star})}{\varepsilon n}$ . Thus we have,

[TABLE]

$\square$

We next give a key lemma that we use to obtain an improved coreset size.

Lemma 2.8 (Number of buckets).

Each non-empty block $B_{i}$ or $B^{\prime}_{i}$ is divided into $O(\varepsilon^{-1/2})$ buckets.

Proof:

We prove that each block $B_{i}\subset\{p_{L+1},\dots,p_{\lfloor\frac{n}{2}\rfloor}\}$ is divided into at most $O(\varepsilon^{-1/2})$ buckets $B_{i,j}$ . Argument for $B^{\prime}_{i}\subset\{p_{\lfloor\frac{n}{2}\rfloor+1},\dots,p_{R}\}$ is entirely symmetric.

Suppose $B_{i}=\{p_{l_{i}},\dots,p_{r_{i}}\}$ and we divide $B_{i}$ into $t$ buckets $\{B_{i,j}\}_{j=0}^{t-1}$ . Since each $B_{i,j}$ is the maximal bucket with $\delta(B_{i,j})\leq\varepsilon\cdot 2^{i}\cdot\mathsf{OPT}$ , we have $\delta(B_{i,2j}\cup B_{i,2j+1})>\varepsilon\cdot 2^{i}\cdot\mathsf{OPT}$ for $2j+1<t$ . Denote $B_{i,2j}\cup B_{i,2j+1}$ by $C_{j}$ for $j\in\{0,\dots,\lfloor\frac{t-2}{2}\rfloor\}$ , we have:

[TABLE]

Here (2.1) is from Cauchy-Schwarz inequality. So we have $(\lfloor\frac{t-2}{2}\rfloor)^{2}\cdot\varepsilon\cdot 2^{i}\cdot\mathsf{OPT}<4\cdot 2^{i}\cdot\mathsf{OPT}$ , which implies $t\leq O(\varepsilon^{-\frac{1}{2}})$ . $\square$

Now we are ready to prove Theorem 2.1.

Proof:

[of Theorem 2.1] We first verify that the set $S$ is an $O(\varepsilon)$ -coreset. Our goal is to prove that for every $c\in\mathbb{R}$ , $f_{S}(c)\in(1\pm\varepsilon)\cdot f_{P}(c)$ . We prove this for any $c\in(-\infty,c^{\star}]$ . The argument for $c\in(c^{\star},+\infty)$ is entirely symmetric.

For any $c\in(-\infty,c^{\star}]$ , we have

[TABLE]

where $B$ takes over all buckets. We then separately analyze the $c\in(-\infty,p_{L}]$ case and the $c\in(p_{L},c^{*}]$ case.

When $c\in(-\infty,p_{L}]$ , we note that $f_{P}(p_{L})=f_{S}(p_{L})$ (Lemma 2.6). By elementary calculus, both $\frac{df_{P}(c)}{dc}$ and $\frac{df_{S}(c)}{dc}$ are within $[-n,-(1-2\varepsilon)n]$ ; hence differ by at most a multiplicative factor of $1+\varepsilon$ . Thus, $|f_{P}(c)-f_{S}(c)|\leq O(\varepsilon)\cdot f_{P}(c)$ .

When $c\in(p_{L},c^{*}]$ , there is at most one bucket $B=\{p_{l},\dots,p_{r}\}$ such that $c\in(p_{l},p_{r})$ since these buckets are disjoint. If such a bucket $B$ does not exist, we have $f_{P}(c)=f_{S}(c)$ . Now suppose such a bucket $B$ exists. Since $c>p_{L}$ , we have $B\subset B_{i}$ for some block $B_{i}$ . Thus, by Lemma 2.6 and the construction of buckets:

[TABLE]

We have $f_{P}(p_{l})\geq 2^{i}\cdot\mathsf{OPT}$ and $f_{P}(p_{r})\geq 2^{i}\cdot\mathsf{OPT}$ . Since $f_{P}$ is convex (thus decreasing on $(-\infty,c^{*}]$ ) and $c\in(p_{l},p_{r})$ , we also have $f_{P}(c)\geq 2^{i}\cdot\mathsf{OPT}$ . This implies $|f_{P}(c)-f_{S}(c)|\leq\varepsilon\cdot f_{P}(c)$ .

It remains to show that the size of $S$ , which is the total number of buckets, is $\tilde{O}(\varepsilon^{-1/2})$ . However, by Lemma 2.7, there are $O(\log(1/\varepsilon))$ blocks, and by Lemma 2.8, each block contains $O(\varepsilon^{-1/2})$ buckets. Thus, there are at most $\tilde{O}(\varepsilon^{-1/2})$ buckets. $\square$

2.2 Tight Lower Bound on Coreset Size for $1$ -d $k$ -Median when $k\geq 2$

In this subsection, we prove that the size lower bound of $\varepsilon$ -coreset for $k$ -Median problem in $\mathbb{R}^{1}$ is $\Omega(\frac{k}{\varepsilon})$ . This lower bound matches the upper bound in (Har-Peled and Kushal, 2005).

Theorem 2.9 (Coreset lower bound for $1$ -d $k$ -Median when $k\geq 2$ ).

For a given integer $k\geq 2$ and $\varepsilon\in(0,1)$ , there exists a dataset $P\subset\mathbb{R}$ such that any $\varepsilon$ -coreset $S$ must have size $|S|\geq\Omega(k\varepsilon^{-1})$ .

For ease of exposition, we only prove the lower bound for $2$ -Median here. The generalization to $k$ -Median is straightforward and can be found in appendix A.

We first prove a technical lemma, which shows that a quadratic function cannot be approximated well by an affine linear function in a long enough interval. We note that similar technical lemmas appear in coresets lower bound of other related clustering problems (Braverman et al., 2019) (Baker et al., 2020). The lemma in (Braverman et al., 2019) shows that the function $\sqrt{x}$ cannot be approximated well by an affine linear function while our lemma is about approximating a quadratic function. The lemma in (Baker et al., 2020) shows that a quadratic function cannot be approximated well by an affine linear function on a bounded interval, a situation slightly different from ours.

Lemma 2.10 (Quadratic function cannot be approximated well by affine linear functions).

Let $[a,b]$ be an interval, $f(c)$ be a quadratic function on interval $[a,b]$ , $\alpha>0$ and $\beta>0$ be two constants, and $0\leq\varepsilon<\frac{1}{32}\frac{\beta}{\alpha}$ be a non-negative real number. If $|f(c)|\leq\alpha$ and $(b-a)^{2}f^{\prime\prime}(c)\geq\beta$ for all $c\in[a,b]$ , then there is no affine linear function $g$ such that $|g(c)-f(c)|\leq\varepsilon f(c)$ for all $c\in[a,b]$ .

Proof:

Assume there is an affine linear function $g(c)$ that satisfies $|g(c)-f(c)|\leq\varepsilon f(c)$ for all $c\in[a,b]$ . We denote the error function by $r(c)=f(c)-g(c)$ , which has two properties. First, its $l_{\infty}$ norm $\|r\|_{\infty}=\sup_{c\in[a,b]}|r(c)|\leq\varepsilon\alpha$ . Second, it is quadratic and satisfies $r^{\prime\prime}(c)=f^{\prime\prime}(c)$ , thus $(b-a)^{2}r^{\prime\prime}(c)\geq\beta$ for all $c\in[a,b]$ .

Define $L=b-a$ . By the mean value theorem, there is a point $c_{1/4}\in[a,\frac{a+b}{2}]$ such that $|r^{\prime}(c_{1/4})|=|\frac{1}{L/2}[r(\frac{a+b}{2})-r(a)]|\leq\frac{4}{L}\|r\|_{\infty}$ . Similarly there is a point $c_{3/4}\in[\frac{a+b}{2},b]$ such that $|r^{\prime}(c_{3/4})|\leq\frac{4}{L}\|r\|_{\infty}$ . Since $r$ is a quadratic function, its derivative is monotonic and $|r^{\prime}(\frac{a+b}{2})|\leq\max(|r^{\prime}(c_{1/4})|,|r^{\prime}(c_{3/4})|)\leq\frac{4}{L}\|r\|_{\infty}$ . Thus we have

[TABLE]

On the other hand $r(b)-r(\frac{a+b}{2})\leq 2\|r\|_{\infty}\leq 2\varepsilon\alpha$ . We have $2\varepsilon\alpha\geq-2\varepsilon\alpha+\frac{1}{8}\beta$ . Thus $\varepsilon\geq\frac{1}{32}\frac{\beta}{\alpha}$ . $\square$

For any dataset $P$ , with a slight abuse of notations, we denote the cost function for $2$ -Median with one query point fixed in [math] by $f_{P}(c)=\mathrm{cost}(P,\{0,c\})$ . The following lemma shows that $f_{P}(c)$ is a piecewise affine linear function and all the transition points are $P\cup\{2p\mid p\in P\}$ .

Lemma 2.11 (The function $f_{P}(c)$ is piecewise affine linear).

Let $P\subset\mathbb{R}$ be a weighted dataset. The function $f_{P}(c)$ is a piecewise affine linear function. All the transition points between two affine pieces are $P\cup\{2p\mid p\in P\}$ .

Proof:

We denote the weight of point $p$ by $w(p)$ and denote the midpoint between any point $c$ and [math] by $\text{mid}=\frac{c}{2}$ . Now assume $c\geq 0$ and both $c$ and $\frac{c}{2}$ are not in the dataset $P$ . The clustering cost of a single point $p$ is

[TABLE]

If $c$ changes to $c+\mathrm{dc}$ we have

[TABLE]

Assume $|\mathrm{dc}|$ is small enough, then there are no data points in $[\text{mid},\text{mid}+\frac{1}{2}\mathrm{dc}]$ and $[c,c+\mathrm{dc}]$ . We have

[TABLE]

thus

[TABLE]

Consider $c$ moves in $\mathbb{R}$ from left to right, the derivative $f_{P}^{\prime}(c)$ changes only when $c$ or $\text{mid}=\frac{c}{2}$ pass a data point in $P$ . The same conclusion also holds for $c<0$ by a symmetric argument. This is exactly what we want. $\square$

Proof:

[ $2$ -Median case of Theorem 2.9] We first construct the dataset $P$ . The dataset $P$ is a union of $\frac{1}{\varepsilon}$ disjoint intervals $\{I_{i}\}_{i=1}^{\frac{1}{\varepsilon}}$ . Denote the left endpoint and right endpoint of $I_{i}$ by $l_{i}$ and $r_{i}$ respectively. We recursively define $l_{i}=r_{i-1}$ for $i\geq 2$ , $r_{i}=l_{i}+4^{i-1}$ for $i\geq 1$ , and $l_{1}=0$ . Thus $r_{i}=l_{i+1}=\frac{1}{3}(4^{i}-1)$ . The weight of points is specified by a measure $\lambda$ on $P$ . The measure is absolutely continuous with respect to Lebesgue measure $m$ such that its density on the $i$ th interval is $\frac{\mathrm{d\lambda}}{\mathrm{dm}}=(\frac{1}{16})^{i-1}$ . We denote the density on the $i$ th interval by $\mu_{i}$ and the density at point $p$ by $\mu(p)$ . Note that $P$ can be discretized in the following way. We only need to take a large enough constant $n$ , create a bucket $B_{i}$ of $(\frac{1}{4})^{i-1}n$ equally spaced points in each interval $I_{i}$ , and assign weight $\frac{1}{n}$ to every point.

The cost function $f_{P}(c)$ has following two features:

the function value $f_{P}(c)\in[0,\frac{2}{\varepsilon}]$ for any $c\in\mathbb{R}$ , 2. 2.

the function is quadratic on the interval $[l_{i}+\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ and satisfies $[\frac{2}{3}(r_{i}-l_{i})]^{2}f^{\prime\prime}_{P}(c)=\frac{2}{3}$ for each $i$ .

We show how to prove theorem 2.9 from these features and defer verification of these features later. Note that feature 2 does not contradict lemma 2.11 since the dataset contains infinite points.

Assume that $S$ is an $\frac{\varepsilon}{300}$ -coreset of $P$ . We prove $|S|\geq\frac{1}{2\varepsilon}$ by contradiction. If $|S|<\frac{1}{2\varepsilon}$ , then there is an interval $I_{i}=[l_{i},r_{i}]$ such that $(l_{i},r_{i})\cap S=\varnothing$ by the pigeonhole’s principle. Consider function $f_{S}(c)$ on interval $[l_{i}+\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ . When $c\in[l_{i}+\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ , we have $\frac{c}{2}\in[l_{i},r_{i}]$ . Thus both $c$ and $\frac{c}{2}$ do not pass points in $S$ when $c$ moves from $l_{i}+\frac{1}{3}(r_{i}-l_{i})$ to $r_{i}$ . By lemma 2.11, function $f_{S}(c)$ is affine linear on interval $[l_{i}+\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ . Since $S$ is an $\frac{\varepsilon}{300}$ -coreset of $P$ , we have $|f_{S}(c)-f_{P}(c)|\leq\frac{\varepsilon}{300}f_{P}(c)$ on interval $[l_{i}+\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ . However, by applying lemma 2.10 to $f_{P}(c)$ and $f_{S}(c)$ on interval $[l_{+}\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ with $\alpha=\frac{2}{\varepsilon}$ and $\beta=\frac{2}{3}$ , we obtain that $\frac{\varepsilon}{300}\geq\frac{1}{32}\times\frac{2}{3}\times\frac{\varepsilon}{2}>\frac{\varepsilon}{300}$ . This is a contradiction.

It remains to verify the two features of $f_{P}(c)$ . We verify feature 1 by direct computations. For any point $c$ , the function satisfies

[TABLE]

To verify feature 2, we compute the first order derivative by computing the change of the function value $f_{P}(c+\mathrm{dc})-f_{P}(c)$ up to the first order term when $c$ increases an infinitesimal number $\mathrm{dc}$ . The unweighted clustering cost of a single point $p$ is

[TABLE]

As $c$ increases to $c+\mathrm{dc}$ , the clustering cost of a single point changes

[TABLE]

The cumulative clustering cost changes

[TABLE]

Thus the first order derivative $f_{P}^{\prime}(c)=\lambda([\frac{c}{2},c])-\lambda([c,+\infty))$ and the second order derivative

[TABLE]

For $c\in[l_{i}+\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ , the two points $c$ and $\frac{c}{2}$ both lie in interval $[l_{i},r_{i}]$ . We have $\mu(c)=\mu(\frac{c}{2})=\mu_{i}$ and $f_{P}^{\prime\prime}(c)=\frac{3}{2}\mu_{i}$ . Thus the function $f_{P}(c)$ is quadratic on $[l_{i}+\frac{1}{3}(r_{i}-l_{i}),r_{i}]$ and $[\frac{2}{3}(r_{i}-l_{i})]^{2}f^{\prime\prime}_{P}(c)=\frac{2}{3}$ . $\square$

3 Improve Coreset Sizes when $2\leq d\leq\varepsilon^{-2}$

In this section, we consider the case of constant $d$ , $2\leq d\leq\varepsilon^{-2}$ , and provide several improved coreset bounds for a general problem of Euclidean $k$ -Median, called Euclidean $(k,z)$ -Clustering. The only difference from $k$ -Median is that the goal is to find a $k$ -center set $C\subset\mathbb{R}^{d}$ that minimizes the objective function

[TABLE]

where $d^{z}$ represents the $z$ -th power of the Euclidean distance. The coreset notion is as follows.

Definition 3.1 ( $\varepsilon$ -Coreset for Euclidean $(k,z)$ -Clustering (Har-Peled and Mazumdar, 2004)).

Given a dataset $P\subset\mathbb{R}^{d}$ of $n$ points, an integer $k\geq 1$ , constant $z\geq 1$ and $\varepsilon\in(0,1)$ , an $\varepsilon$ -coreset for Euclidean $(k,z)$ -Clustering is a subset $S\subseteq P$ with weight $w:S\to\mathbb{R}_{\geq 0}$ , such that

[TABLE]

We first study the case of $k=1$ and provide a coreset upper bound $\tilde{O}(\sqrt{d}\varepsilon^{-1})$ (Theorem 3.2). Then we study the general case $k\geq 1$ and provide a coreset lower bound $\Omega(kd)$ (Theorem 3.8).

3.1 Improved Coreset Size in $\mathbb{R}^{d}$ when $k=1$

We prove the following main theorem for $k=1$ whose center is a point $c\in\mathbb{R}^{d}$ .

Theorem 3.2 (Coreset for Euclidean $(1,z)$ -Clustering).

Let integer $d\geq 1$ , constant $z\geq 1$ and $\varepsilon\in(0,1)$ . There exists a randomized polynomial time algorithm that given a dataset $P\subset\mathbb{R}^{d}$ , outputs an $\varepsilon$ -coreset for Euclidean $(1,z)$ -Clustering of size at most $z^{O(z)}\sqrt{d}\varepsilon^{-1}\log\varepsilon^{-1}$ .

Proof sketch:

By (Braverman et al., 2022), we first reduce the problem to constructing a mixed coreset $(S,w)$ for Euclidean $(1,z)$ -Clustering for a dataset $P\subset B(0,1)$ satisfying that $\forall c\in\mathbb{R}^{d}$ ,

[TABLE]

The main idea to construct such $S$ is to prove that the class discrepancy of Euclidean $(1,z)$ -Clustering for $P$ is at most $z^{O(z)}\max\left\{1,r\right\}^{z}\cdot\sqrt{d}/m$ for $c\in B(0,r)$ (Lemma 3.7), which implies the existence of a mixed coreseet $S$ of size $z^{O(z)}\sqrt{d}\varepsilon^{-1}$ by Fact 6 of (Karnin and Liberty, 2019). For the class discrepancy, we apply an analytic result of (Karnin and Liberty, 2019) (Theorem 3.6). The main difference is that (Karnin and Liberty, 2019) only considers an additive error that can handle $c\in B(0,1)$ instead of an arbitrary center $c\in\mathbb{R}^{d}$ . In our case, we allow a mixed error proportional to the scale of $\|c\|_{2}$ and extend the approach of (Karnin and Liberty, 2019) to handle arbitrary centers $c\in\mathbb{R}^{d}$ by increasing the discrepancy by a multiplicative factor $\|c\|_{2}^{z}$ . $\square$

The above theorem is powerful and leads to the following results for $z=O(1)$ :

By dimension reduction as in (Huang and Vishnoi, 2020; Cohen-Addad et al., 2021, 2022), we can assume $d=O(\varepsilon^{-2}\log\varepsilon^{-1})$ . Consequently, our coreset size is upper bounded by $\tilde{O}(\varepsilon^{-2})$ , which matches the nearly tight bound in (Cohen-Addad et al., 2022). 2. 2.

For $d=O(1)$ , our coreset size is $O(\varepsilon^{-1})$ , which is the first known result in small dimensional space. Specifically, the prior known coreset size in $\mathbb{R}^{2}$ is $\tilde{O}(\varepsilon^{-3/2})$ (Braverman et al., 2022), and our result improves it by a factor of $\varepsilon^{-1/2}$ .

We conjecture that our coreset size is almost tight, i.e., there exists a coreset lower bound $\Omega(\sqrt{d}\varepsilon^{-1})$ for constant $2\leq d\leq\varepsilon^{-2}$ , which leaves as an interesting open problem.

3.1.1 Useful Notations and Facts

For preparation, we first propose a notion of mixed coreset (Definition 3.3), and then introduce some known discrepancy results.

Reduction to mixed coreset.

Let $B(a,r)$ denote the $\ell_{2}$ -ball in $\mathbb{R}^{d}$ that centers at $a\in\mathbb{R}^{d}$ with radius $r\geq 0$ . Specifically, $B(0,1)$ is the unit ball centered at the original point.

Definition 3.3 (Mixed coreset for Euclidean $(1,z)$ -Clustering).

Given a dataset $P\subset B(0,1)$ and $\varepsilon\in(0,1)$ , an $\varepsilon$ -mixed-coreset for Euclidean $(1,z)$ -Clustering is a subset $S\subseteq P$ with weight $w:S\to\mathbb{R}_{\geq 0}$ , such that $\forall c\in\mathbb{R}^{d}$ ,

[TABLE]

Actually, prior work (Cohen-Addad et al., 2021, 2022; Braverman et al., 2022) usually consider the following form: $\forall c\in\mathbb{R}^{d}$ ,

[TABLE]

Compared to Definition 1.1, the above inequality allows both a multiplicative error $\varepsilon\cdot\mathrm{cost}_{z}(P,c)$ and an additional additive error $\varepsilon|P|$ . Note that for a small $r=O(1)$ , the additive error $\varepsilon|P|$ dominates the total error; while for a large $r\gg\Omega(1)$ , the multiplicative error $\varepsilon\cdot\mathrm{cost}_{z}(P,c)\approx\varepsilon\|c\|_{2}\cdot|P|$ dominates the total error. Hence, it is not hard to check that Inequality (5) is an equivalent form of Inequality (4) (up to an $2^{O(z)}$ -scale). This is also the reason that we call Definition 3.3 mixed coreset. We have the following useful reduction.

Theorem 3.4 (Reduction from coreset to mixed coreset (Braverman et al., 2022)).

Let $\varepsilon\in(0,1)$ . Suppose there exists a polynomial time algorithm $A$ that constructs an $\varepsilon$ -mixed coreset for Euclidean $(1,z)$ -Clustering of size $\Gamma$ . Then there exists a polynomial time algorithm $A^{\prime}$ that constructs an $\varepsilon$ -coreset for Euclidean $(1,z)$ -Clustering of size $O(\Gamma\log\varepsilon^{-1})$ .

Thus, it suffices to prove that an $\varepsilon$ -mixed coreset is of size $z^{O(z)}\sqrt{d}\varepsilon^{-1}$ , which implies Theorem 3.2.

Class discrepancy.

For preparation, we introduce the notion of class discrepancy introduced by (Karnin and Liberty, 2019). The idea of combining discrepancy and coreset construction has been studied in the literature, specifically for kernel density estimation (Phillips and Tai, 2018a, b; Karnin and Liberty, 2019; Tai, 2022). We propose the following definition.

Definition 3.5 (Class discrepancy (Karnin and Liberty, 2019)).

Let $m\geq 1$ be an integer. Let $f:\mathcal{X},\mathcal{C}\rightarrow\mathbb{R}$ and $P\subseteq\mathcal{X}$ with $|P|=m$ . The class discrepancy of of $P$ w.r.t. $(f,\mathcal{C})$ is

[TABLE]

Moreover, we define $D^{(\mathcal{X},\mathcal{C})}_{m}(f):=\max_{P\subseteq\mathcal{X}:|P|=m}D^{(\mathcal{C})}_{P}(f)$ to be the class discrepancy w.r.t. $(f,\mathcal{X},\mathcal{C})$ .

Here, $\mathcal{X}$ is the instance space and $\mathcal{C}$ is the parameter space. Specifically, for Euclidean $(1,z)$ -Clustering, we let $\mathcal{X},\mathcal{C}\subseteq\mathbb{R}^{d}$ and $f$ be the Euclidean distance. The class discrepancy $D^{(\mathcal{X},\mathcal{C})}_{m}(f)$ measures the capacity of $\mathcal{C}$ . Intuitively, if the capacity of $\mathcal{C}$ is large and leads to a complicated geometric structure of vector $(f(p,c))_{p\in P}$ for $c\in\mathcal{C}$ , $D^{(\mathcal{X},\mathcal{C})}_{m}(f)$ tends to be large.

Useful discrepancy results.

For a vector $p\in\mathbb{R}^{d}$ and integer $l\geq 1$ , let $p^{\otimes l}$ present the $l$ -dimensional tensor obtained from the outer product of $p$ with itself $l$ times. For a $l$ -dimensional tensor $X$ with $d^{l}$ entries, we consider the measure $\|X\|_{T_{l}}:=\max_{c\in\mathbb{R}^{d}:\|q\|=1}|\langle X,q^{\otimes l}\rangle|$ . Next, we provide some known results about the class discrepancy.

Theorem 3.6 (An upper bound for class discrepancy (restatement of Theorem 18 of (Karnin and Liberty, 2019))).

Let $\mathcal{X}=B(0,1)$ in $\mathbb{R}^{d}$ . Let $f:\mathbb{R}\rightarrow\mathbb{R}$ be analytic satisfying that for any integer $l\geq 1$ , $|\frac{d^{l}f}{dx^{l}}(x)|\leq\gamma_{1}C^{l}l!$ for some constant $\gamma_{1},C>0$ . Let $\mathcal{C}=B(0,\frac{1}{2C})$ and $m\geq 1$ be an integer. The class discrepancy w.r.t. $(f=f(\langle p,c\rangle),\mathcal{X},\mathcal{C})$ is at most $D^{(\mathcal{X},\mathcal{C})}_{m}(f)\leq\gamma_{2}\gamma_{1}\sqrt{d}/m$ for some constant $\gamma_{2}>0$ .

Moreover, for any dataset $P\subset\mathcal{X}$ of size $m$ , there exists a randomized polynomial time algorithm that constructs $\sigma\in\left\{-1,1\right\}^{P}$ satisfying that for any integer $l\geq 1$ , we have

[TABLE]

This $\sigma$ satisfies $D^{(\mathcal{C})}_{P}(f,\sigma)\leq\gamma_{2}\gamma_{1}\sqrt{d}/m$ .

Note that the above theorem is a constructive result instead of an existential result in Theorem 18 of (Karnin and Liberty, 2019). This is because Theorem 18 of (Karnin and Liberty, 2019) applies the existential version of Banaszczyk’s (Banaszczyk, 1998), which has been proven to be constructive recently (Bansal et al., 2019). Also, note that the construction of $\sigma$ only depends on $P$ and does not depend on the selection of $\mathcal{C}$ . This observation is important for the construction of mixed coresets via discrepancy.

3.1.2 Proof of Theorem 3.2

We are ready to prove Theorem 3.2. The main lemma is as follows.

Lemma 3.7 (Class discrepancy for Euclidean $(1,z)$ -Clustering).

Let $m\geq 1$ be an integer. Let $f=d^{z}$ and $\mathcal{X}=B(0,1)$ . For a given dataset $P\subset\mathcal{X}$ of size $m$ , there exists a vector $\sigma\in\left\{-1,1\right\}^{P}$ such that for any $r>0$ ,

[TABLE]

The above lemma indicates that the class discrepancy for Euclidean $(1,z)$ -Clustering linearly depends on the radius $r$ of the parameter space $\mathcal{C}$ . Note that the lemma finds a vector $\sigma$ that satisfies all levels of parameter spaces $\mathcal{C}=B(0,r)$ simultaneously. This requirement is slightly different from Definition 3.5 that considers a fixed parameter space. Observe that the term $\max\left\{1,r\right\}$ is similar to $\max\left\{1,\|c\|_{2}\right\}$ in Definition 3.3, which is the key of reduction from Lemma 3.7 to Theorem 3.2. The proof idea is similar to that of Fact 6 of (Karnin and Liberty, 2019).

Proof:

[of Theorem 3.2] Let $P\subset B(0,1)$ be a dataset of size $n$ and $\Lambda=z^{O(z)}\sqrt{d}\varepsilon^{-1}$ . By the same argument as in Fact 6 of (Karnin and Liberty, 2019), we can iteratively applying Lemma 3.7 to construct a subset $S\subseteq P$ of size $m=\Theta(\Lambda)$ together with weights $w(p)=\frac{n}{|S|}$ for $p\in S$ and a vector $\sigma\in\left\{-1,1\right\}^{S}$ , and $(S,\sigma)$ satisfies that for any $c\in\mathbb{R}^{d}$ ,

[TABLE]

This implies that $S$ is an $O(\varepsilon)$ -mixed coreset for Euclidean $(1,z)$ -Clustering of size at most $\Lambda=z^{O(z)}\sqrt{d}\varepsilon^{-1}$ , which completes the proof of Theorem 3.2. $\square$

It remains to prove Lemma 3.7.

Proof:

[of Lemma 3.7] Let $P\subset B(0,1)$ be a dataset of size $m$ . We first construct a vector $\sigma\in\left\{-1,1\right\}^{P}$ by the following way:

For each $p\in P$ , construct a point $\phi(p)=(\frac{1}{2}\|p\|_{2}^{2},\frac{\sqrt{2}}{2}p,\frac{1}{2})\in\mathbb{R}^{d+2}$ . 2. 2.

By Theorem 3.6, construct $\sigma\in\left\{-1,1\right\}^{P}$ such that for any integer $l\geq 1$ ,

[TABLE]

Let $\phi(P)$ be the collection of all $\phi(p)$ s. Note that $\|\phi(p)\|_{2}\leq 1$ by construction, which implies that $\phi(P)\subset B(0,1)\subset\mathbb{R}^{d+2}$ . In the following, we show that $\sigma$ satisfies Lemma 3.7.

Fix $r\geq 1$ and let $\mathcal{C}=B(0,r)$ . We construct another dataset $P^{\prime}=\left\{p^{\prime}=\frac{p}{4r}:p\in P\right\}$ . For any $c\in\mathcal{C}=B(0,r)$ , we let $c^{\prime}=\frac{c}{4r}\in B(0,\frac{1}{4})$ . By definition, we have for any $p\in\mathcal{X}$ and $c\in\mathcal{C}$ ,

[TABLE]

which implies that

[TABLE]

Thus, it suffices to prove that

[TABLE]

which implies the lemma. The proof idea of Inequality (6) is similar to that of Theorem 22 of (Karnin and Liberty, 2019).555Note that the proof of Theorem 22 of (Karnin and Liberty, 2019) is actually incorrect. Applying Theorem 18 of (Karnin and Liberty, 2019) may lead to an upper bound $\|\tilde{q}\|_{2}<1$ , which makes $R$ in Theorem 22 of (Karnin and Liberty, 2019) not exist. For each $p^{\prime}\in P^{\prime}$ and $c^{\prime}\in B(0,\frac{1}{4})$ , let $\psi(c^{\prime})=(\frac{1}{8r^{2}},-\frac{\sqrt{2}}{2r}c^{\prime},2\|c^{\prime}\|_{2}^{2})\in\mathbb{R}^{d+2}$ and we can rewrite $f(p^{\prime},c^{\prime})$ as follows:

[TABLE]

We note that $\phi(p)\in B(0,1)$ and $\psi(c^{\prime})\in B(0,\frac{1}{3})$ since $c^{\prime}\in B(0,\frac{1}{4})$ . Construct another function $g:P\times B(0,\frac{1}{3})$ as follows: for each $p\in P$ and $c\in B(0,\frac{1}{3})$ ,

If for any $p^{\prime}\in P$ , $\langle p^{\prime},c\rangle\geq 0$ , let $g(p,c)=g(\langle p,c\rangle)=(\langle p,c\rangle)^{z/2}$ ; 2. 2.

Otherwise, let $g(p,c)=0$ .

We have $|\frac{d^{l}g}{dx^{l}}(x)|\leq z^{O(z)}l!$ for any integer $l\geq 1$ . By the construction of $\sigma$ and Theorem 3.6, we have that

[TABLE]

which implies Inequality (6) since $D^{(B(0,\frac{1}{4}))}_{P^{\prime}}(f,\sigma)\leq D^{(B(0,\frac{1}{3}))}_{\phi(P)}(g,\sigma)$ due to the fact that $\psi(c^{\prime})\in B(0,\frac{1}{3})$ .

Overall, we complete the proof. $\square$

3.2 Improved Coreset Lower Bound in $\mathbb{R}^{d}$ when $k\geq 2$

We present a lower bound for the coreset size in small dimensional spaces.

Theorem 3.8 (Coreset lower bound in small dimensional spaces).

Given an integer $k\geq 1$ , constant $z\geq 1$ and a real number $\varepsilon\in(0,1)$ , for any integer $d\leq\frac{1}{100\varepsilon^{2}}$ , there is a dataset $P\subset\mathbb{R}^{d+1}$ such that its $\varepsilon$ -coreset for $(k,z)$ -Clustering must contain at least $\frac{dk}{10z^{4}}$ points.

When $d=\Theta(\frac{1}{\varepsilon^{2}})$ , Theorem 3.8 gives the well known lower bound $\frac{k}{\varepsilon^{2}}$ . When $d\ll\Theta(\frac{1}{\varepsilon^{2}})$ , the theorem is non-trivial. In the following, we prove Theorem 3.8 for $z=2$ and show how to extend to general $z\geq 1$ in Appendix B.

3.2.1 Preparation

Notations

Let $e_{0},\cdots,e_{d}$ be the standard basis vectors of $\mathbb{R}^{d+1}$ , and $H_{1},\cdots,H_{k/2}$ be $k/2$ $d$ -dimensional affine subspaces, where $H_{j}:=jLe_{0}+\text{span}\left\{e_{1},\dots,e_{d}\right\}$ for a sufficiently large constant $L$ . For any $p\in\mathbb{R}^{d+1}$ , we use $\tilde{p}$ to denote the $d$ -dimensional vector $p_{1:d}$ (i.e., discard the [math]-th coordinate of $p$ ).

Hard instance

We construct the hard instance as follows. Take $P_{j}=\{jLe_{0}+e_{1},\cdots,jLe_{0}+e_{d/2}\}$ for $j\in\{1,\dots,k/2\}$ and take $P$ to be the union of all $P_{j}$ . The hard instance is $P$ . Note that $P_{j}\subset H_{j}$ for each $j$ and $|P|=kd/4$ .

In our proof, we always put two centers in each $H_{j}$ . Thus for large enough $L$ , all $p\in P_{j}$ must be assigned to centers in $H_{j}$ .

We will use the following two technical lemmas from (Cohen-Addad et al., 2022).

Lemma 3.9.

For any $k\geq 1$ , let $\{c_{1},\cdots,c_{k}\}$ be arbitrary $k$ unit vectors in $\mathbb{R}^{d}$ , we have

[TABLE]

Lemma 3.10.

Let $S$ be a set of points in $\mathbb{R}^{d}$ of size $t$ and $w:S\rightarrow\mathbb{R}^{+}$ be their weights. There exist $2$ unit vectors $v_{1},v_{2}$ , such that

[TABLE]

3.2.2 Proof of Theorem 3.8 when $z=2$

Now we are ready to prove Theorem 3.8 when $z=2$ .

Proof:

Note that points in $S$ might not be in any $H_{j}$ . We first map each point $p\in S$ to an index $j_{p}\in[k/2]$ such that $H_{j_{p}}$ is the nearest subspace of $p$ . The mapping is quite simple:

[TABLE]

where $p_{0}$ is the [math]-th coordinate of $p$ . Let $\Delta_{p}=p_{0}-j_{p}L$ , which is the distance of $p$ to the closest affine subspace. Let $S_{j}:=\{p\in S:j_{p}=j\}$ be the set of points in $P$ , whose closest affine subspace is $H_{j}$ . Define $I:=\{j\in[k/2]:|S_{j}|\leq d/4\}$ . Consider any $k$ -center set $C$ such that $H_{j}\bigcap C\neq\emptyset$ . Then $\mathrm{cost}(P,C)\ll 0.1L$ for sufficiently large $L$ . On the other hand, $\mathrm{cost}(S,C)\geq\sum_{p\in S}\Delta_{p}^{2}$ . Since $S$ is a coreset, $\Delta_{p}^{2}\ll L$ for all $p\in S$ . 666Here we do not allow offsets to simplify the proof, but our technique can be extended to handle offsets. Therefore each $p\in S$ must be very close to its closest affine subspace; in particular, we can assume that $p$ must be assigned to some center in $H_{j_{p}}$ (if there exists one).

In the proof follows, we consider three different set of $k$ centers $C_{1},C_{2}$ and $C_{3}$ and compare the costs $\mathrm{cost}(P,C_{i})$ and $\mathrm{cost}(S,C_{i})$ for $i=1,2,3$ . In each $C_{i}$ , there are two centers in each $H_{j}$ . As we have discussed above, for large enough $L$ , the total cost for both $P$ and $S$ can be decomposed into the sum of costs over all affine subspaces.

For each $j\in\bar{I}$ , the corresponding centers in $H_{j}$ are the same across $C_{1},C_{2},C_{3}$ . Let $c_{j}$ be any point in $H_{j}$ such that $c_{j}-jLe_{0}$ has unit norm and is orthogonal to $e_{1},\cdots,e_{d/2}$ ; in other words, $\|\tilde{c}_{j}\|=1$ and the first $d/2$ coordinates of $\tilde{c}_{j}=1$ are all zero. Specifically, we set $c_{j}=jLe_{0}+e_{d/2+1}$ and the two centers in $H_{j}$ are two copies of $c_{j}$ for $j\in\bar{I}$ .

We first consider the following $k$ centers denoted by $C_{1}$ . As we have specified the centers for $j\in\bar{I}$ , we only describe the centers for each $j\in I$ . Since by definition, $|S_{j}|\leq d/4$ , we can find a vector $c_{j}\in\mathbb{R}^{d+1}$ in $H_{j}$ such that $c_{j}-jLe_{0}$ has unit norm and is orthogonal to $e_{1},\cdots,e_{d/2}$ and all vectors in $S_{j}$ . Let $C_{1}$ be the set of $k$ points with each point in $\{c_{1},\cdots,c_{k/2}\}$ copied twice. We evaluate the cost of $C_{1}$ with respect to $P$ and $S$ .

Lemma 3.11.

For $C_{1}$ constructed above, we have $\mathrm{cost}(P,C_{1})=\frac{kd}{2}$ and

[TABLE]

Proof:

Since $e_{i}$ is orthogonal to $c_{j}-jLe_{0}$ and $c_{j}-jLe_{0}$ has unit norm for all $i,j$ , it follows that

[TABLE]

On the other hand, the cost of $C$ w.r.t. $S_{j}$ is

[TABLE]

Recall $\tilde{p}\in\mathbb{R}^{d}$ is $p_{1:d}$ . For $j\in I$ , the inner product is [math], and thus the total cost w.r.t. $S$ is

[TABLE]

which finishes the proof. $\square$

For notational convenience, we define $\kappa:=2\sum_{j\in\bar{I}}\sum_{p\in S_{j}}w(p)\langle p-jLe_{0},jLe_{0}-c_{j}\rangle$ . Since $S$ is an $\varepsilon$ -coreset of $P$ , we have

[TABLE]

Next we consider a different set of $k$ centers denoted by $C_{2}$ . By Lemma 3.10, there exists unit vectors $v^{j}_{1},v^{j}_{2}\in\mathbb{R}^{d}$ such that

[TABLE]

Applying this to all $j\in I$ and get corresponding $v^{j}_{1},v^{j}_{2}$ for all $j\in I$ . Let $C_{2}=\{u_{1}^{1},u_{2}^{2},\cdots,u_{1}^{k/2},u_{2}^{k/2}\}$ be a set of $k$ centers in $\mathbb{R}^{d+1}$ defined as follows: if $j\in I$ , $u_{\ell}^{j}$ is $v_{\ell}^{j}$ with an additional [math]th coordinate with value $jL$ , making them lie in $H_{j}$ ; for $j\in\bar{I}$ , we use the same centers as in $C_{1}$ , i.e., $u_{1}^{j}=u_{2}^{j}=c_{j}$ .

Lemma 3.12.

For $C_{2}$ constructed above, we have

[TABLE]

Proof:

By (10),

[TABLE]

By Lemma 3.9 (with $k=2$ ), we have

[TABLE]

It follows that

[TABLE]

where in the inequality, we also used the orthogonality between $e_{i}$ and $c_{j}-jLe_{0}$ . $\square$

Since $S$ is an $\varepsilon$ -coreset of $P$ , we have

[TABLE]

which implies

[TABLE]

By definition, $|S_{j}|\leq d/4$ , so

[TABLE]

and it follows that

[TABLE]

Finally we consider a third set of $k$ centers $C_{3}$ . Similarly, there are two centers per group. We set $m$ be a power of $2$ in $[d/2,d]$ . Let $h_{1},\cdots,h_{m}$ be the $m$ -dimensional Hadamard basis vectors. So all $h_{\ell}$ ’s are $\{-\frac{1}{\sqrt{m}},\frac{1}{\sqrt{m}}\}$ vectors and $h_{1}=(\frac{1}{\sqrt{m}},\cdots,\frac{1}{\sqrt{m}})$ . We slightly abuse notation and treat each $h_{\ell}$ as a $d$ -dimensional vector by concatenating zeros in the end. For each $h_{\ell}$ construct a set of $k$ centers as follows. For each $j\in\bar{I}$ , we still use two copies of $c_{j}$ . For $j\in I$ , the [math]th coordinate of the two centers is $jL$ , then we concatenate $h_{\ell}$ and $-h_{\ell}$ respectively to the first and the second centers.

Lemma 3.13.

Suppose $C_{3}$ is constructed based on $h_{\ell}$ . Then for all $\ell\in[m]$ , we have

[TABLE]

Proof:

For $j\in I$ , the cost of the two centers w.r.t. $P_{j}$ is

[TABLE]

For $j\in\bar{I}$ , the cost w.r.t. $P_{j}$ is $d$ by (7). Thus, the total cost over all subspaces is

[TABLE]

On the other hand, for $j\in I$ , the cost w.r.t. $S_{j}$ is

[TABLE]

Here $h^{p}_{\ell}=s^{p}\cdot h_{\ell}$ , where $s^{p}=\arg\max_{s=\{-1,+1\}}\langle\tilde{p},s\cdot h_{\ell}\rangle$ . For $j\in\bar{I}$ , the cost w.r.t. $S_{j}$ is $\sum_{p\in S_{j}}w(p)(\Delta_{p}^{2}+\|\tilde{p}\|^{2}+1)-2\langle p-jLe_{0},jLe_{0}-c_{j}\rangle)$ by (8). Thus, the total cost w.r.t. $S$ is

[TABLE]

This finishes the proof. $\square$

Corollary 3.14.

Let $S$ be a $\varepsilon$ -coreset of $P$ , and $I=\{j:|S_{j}|\leq d/4\}$ . Then

[TABLE]

Proof:

Since $S$ is an $\varepsilon$ -coreset, we have by Lemma 3.13

[TABLE]

Note that the above inequality holds for all $\ell\in[m]$ , then

[TABLE]

By the Cauchy-Schwartz inequality,

[TABLE]

Therefore, we have

[TABLE]

$\square$

Combining the above corollary with (11), we have

[TABLE]

By the assumption $d\leq\frac{1}{100\varepsilon^{2}}$ , it holds that $|I|\leq\frac{3k}{10}$ or $|\bar{I}|\geq\frac{k}{2}-\frac{3k}{10}=\frac{k}{5}$ . Moreover, since $|S_{j}|>\frac{d}{4}$ for each $j\in\bar{I}$ , we have $|S|>\frac{d}{4}\cdot\frac{k}{5}=\frac{kd}{20}$ . $\square$

4 Conclusion

This work studies coresets for $k$ -Median problem in small dimensional Euclidean spaces. We give tight size bounds for $k$ -Median in $\mathbb{R}$ and show that the framework in (Har-Peled and Kushal, 2005), with significant improvement, is optimal. For $d\geq 2$ , we improve existing coreset upper bounds for $1$ -Median and prove new lower bounds.

Our work leaves several interesting problems for future research. One of which is to close the gap between upper bounds and lower bounds for $d\geq 2$ . Another one is to generalize our results to $(k,z)$ -Clustering for general $z$ . Note that the generalization is non-trivial even for $d=1$ since the cost function is piece-wise linear for $k$ -Median while piece-wise polynomial of order $z$ for general $(k,z)$ -Clustering.

Appendix A Coreset Lower Bound for General $k$ -Median in $\mathbb{R}$

We prove the general case of Theorem 2.9 here.

Proof:

[the general case of Theorem 2.9]

We first construct the hard instance $P$ . Let $P_{1}$ denote the hard instance we have constructed in the proof of Theorem 2.9. We take a large enough constant $L>0$ , take $P_{i}=(i-1)L+P_{1}$ , and take $P=\cup_{i=1}^{\frac{k}{2}}P_{i}$ . Here $(i-1)L+P_{1}$ means $\{(i-1)L+p|p\in P_{1}\}$ .

The dataset $P$ is a unification of $\frac{k}{2}$ copies of $P_{1}$ . These copies are far from each other. Thus $k$ -Median problem on $P$ can be decomposed to $2$ -Median problem on each copy. We prove the $k$ -Median lower bound by applying the argument for the $2$ -Median lower bound on every single copy and combining them together.

We denote $P_{1}=\cup_{j=1}^{\frac{1}{\varepsilon}}I_{1,j}$ , where $I_{1,j}$ is the $j$ -th interval we constructed in the proof of the $2$ -Median case of Theorem 2.9. We denote $I_{i,j}=(i-1)L+I_{1,j}$ , denote the left endpoint and right endpoint of $I_{i,j}$ by $l_{i,j}$ and $r_{i,j}$ respectively. We have $P_{i}=\cup_{j=1}^{\frac{1}{\varepsilon}}I_{i,j}$ .

Now, assume that $S$ is an $\frac{\varepsilon}{300}$ coreset of $P$ such that $|S|<\frac{k}{4\varepsilon}$ . We prove that there must be a contradiction. Since $|S|<\frac{k}{4\varepsilon}$ , there must be at least half of $i$ such that $(l_{i,j_{i}},r_{i,j_{i}})\cap S=\varnothing$ for some $j_{i}$ . We assume that these indexes are $1,2,\dots,\frac{k}{4}$ , without loss of generality. We define a parametrized query family as $Q(t)=\cup_{i=1}^{\frac{k}{2}}Q_{i}(t)$ , where $t\in[\frac{1}{3},1]$ and

[TABLE]

Consider $\mathrm{cost}(P,Q(t))$ , a function of $t$ . Since $L$ is large enough, we have $\mathrm{cost}(P,Q(t))=\sum_{i=1}^{\frac{k}{2}}\mathrm{cost}(P_{i},Q_{i}(t))$ . The computation we have done in the proof of the $2$ -Median case of Theorem 2.9 implies that $\mathrm{cost}(P_{i},Q_{i}(t))\leq\frac{2}{\varepsilon}$ for each $i$ and

[TABLE]

Thus we have $\mathrm{cost}(P,Q(t))\leq\frac{k}{\varepsilon}$ and $(1-\frac{1}{3})^{2}\frac{\mathrm{d^{2}}}{\mathrm{dt^{2}}}\mathrm{cost}(P,Q(t))=\frac{k}{9}$ .

It’s easy to see that $\mathrm{cost}(S,Q(t))$ is affine linear since $(l_{i,j_{i}},r_{i,j_{i}})\cap S=\varnothing$ for $i\leq\frac{k}{4}$ . Since $S$ is an $\frac{\varepsilon}{300}$ coreset, we have $|\mathrm{cost}(S,Q(t))-\mathrm{cost}(P,Q(t))|\leq\frac{\varepsilon}{300}\mathrm{cost}(P,Q(t))$ . By Lemma 2.10, we must have $\frac{\varepsilon}{300}\geq\frac{1}{32}\frac{\varepsilon}{k}\frac{k}{9}>\frac{\varepsilon}{300}$ , which leads to a contradiction. $\square$

Appendix B Proof of Theorem 3.8 for General $z\geq 1$

Using similar ideas from [Cohen-Addad et al., 2022], our proof of the lower bound for $z=2$ can be extended to arbitrary $z$ . First, we provide two lemmas analogous to Lemma 3.9 and Lemma 3.10 for general $z\geq 1$ . Their proofs can be found in Appendix A in [Cohen-Addad et al., 2022].

Lemma B.1.

For any even number $k\geq 1$ , let $\{c_{1},\cdots,c_{k}\}$ be arbitrary $k$ unit vectors in $\mathbb{R}^{d}$ such that for each $i$ there exist some $j$ satisfying $c_{i}=-c_{j}$ . We have

[TABLE]

Lemma B.2.

Let $S$ be a set of points in $\mathbb{R}^{d}$ of size $t$ and $w:S\rightarrow\mathbb{R}^{+}$ be their weights. For arbitrary $\Delta_{p}$ for each $p$ , there exist $2$ unit vectors $v_{1},v_{2}$ satisfying $v_{1}=-v_{2}$ , such that

[TABLE]

In this proof, the original point set $P$ and three sets of $k$ -centers, namely $C_{1},C_{2},C_{3}$ , are the same as for the case $z=2$ . The difference is that now $I=\{j:|S_{j}|\leq\frac{d}{2^{z}}\}$ and when constructing $C_{2}$ , we use Lemma B.2 in place of Lemma 3.10. Again, we compare the cost of $P$ and $S$ w.r.t. $C_{1},C_{2},C_{3}$ and get the following lemmas.

Lemma B.3.

For $C_{1}$ constructed above, we have $\mathrm{cost}(P,C_{1})=\frac{kd}{4}\cdot 2^{z/2}$ and

[TABLE]

Proof:

Since $e_{i}$ is orthogonal to $c_{j}-jLe_{0}$ and $c_{j}-jLe_{0}$ has unit norm for all $i,j$ , it follows that

[TABLE]

On the other hand, the cost of $C_{1}$ w.r.t. $S_{j}$ is

[TABLE]

For $j\in I$ , the inner product is [math], and thus the total cost w.r.t. $S$ is

[TABLE]

which finishes the proof. $\square$

For notational convenience, we define $\kappa:=\sum_{j\in\bar{I}}\sum_{p\in S_{j}}w(p)\|p-c_{j}\|^{z}$ . Since $S$ is an $\varepsilon$ -coreset of $P$ , we have

[TABLE]

Next we consider a different set of $k$ centers denoted by $C_{2}$ . By Lemma B.2, there exists unit vectors $v^{j}_{1},v^{j}_{2}\in\mathbb{R}^{d}$ satisfying $v^{j}_{1}=-v^{j}_{2}$ such that

[TABLE]

Applying this to all $j\in I$ and get corresponding $v^{j}_{1},v^{j}_{2}$ for all $j\in I$ . Let $C_{2}=\{u_{1}^{1},u_{2}^{2},\cdots,u_{1}^{k/2},u_{2}^{k/2}\}$ be a set of $k$ centers in $\mathbb{R}^{d+1}$ defined as follows: if $j\in I$ , $u_{\ell}^{j}$ is $v_{\ell}^{j}$ with an additional [math]th coordinate with value $jL$ , making them lie in $H_{j}$ ; for $j\in\bar{I}$ , we use the same centers as in $C_{1}$ , i.e., $u_{1}^{j}=u_{2}^{j}=c_{j}$ .

Lemma B.4.

For $C_{2}$ constructed above, we have

[TABLE]

Proof:

By (15),

[TABLE]

By Lemma B.1 (with $k=2$ ), we have

[TABLE]

It follows that

[TABLE]

where in the inequality, we also used the orthogonality between $e_{i}$ and $c_{j}-jLe_{0}$ . $\square$

Since $S$ is an $\varepsilon$ -coreset of $P$ , we have

[TABLE]

which implies

[TABLE]

By definition, $|S_{j}|\leq d/t^{2}$ , so

[TABLE]

and it follows that

[TABLE]

Finally we consider a third set of $k$ centers $C_{3}$ . Similarly, there are two centers per group. We set $m$ be a power of $2$ in $[d/2,d]$ . Let $h_{1},\cdots,h_{m}$ be the $m$ -dimensional Hadamard basis vectors. So all $h_{\ell}$ ’s are $\{-\frac{1}{\sqrt{m}},\frac{1}{\sqrt{m}}\}$ vectors and $h_{1}=(\frac{1}{\sqrt{m}},\cdots,\frac{1}{\sqrt{m}})$ . We slightly abuse notation and treat each $h_{\ell}$ as a $d$ -dimensional vector by concatenating zeros in the end. For each $h_{\ell}$ construct a set of $k$ centers as follows. For each $j\in\bar{I}$ , we still use two copies of $c_{j}$ . For $j\in I$ , the [math]th coordinate of the two centers is $jL$ , then we concatenate $h_{\ell}$ and $-h_{\ell}$ respectively to the first and the second centers.

Lemma B.5.

Suppose $C_{3}$ is constructed based on $h_{\ell}$ . Then for all $\ell\in[m]$ , we have

[TABLE]

Proof:

For $j\in I$ , the cost of the two centers w.r.t. $P_{j}$ is

[TABLE]

For $j\in\bar{I}$ , the cost w.r.t. $P_{j}$ is $\frac{d}{2}\cdot 2^{z/2}$ by (12). Thus, the total cost over all subspaces is

[TABLE]

On the other hand, for $j\in I$ , the cost w.r.t. $S_{j}$ is

[TABLE]

Here $h^{p}_{\ell}=s^{p}\cdot h_{\ell}$ , where $s^{p}=\arg\max_{s=\{-1,+1\}}\langle\tilde{p},s\cdot h_{\ell}\rangle$ . For $j\in\bar{I}$ , the total cost w.r.t. $S_{j}$ is $\kappa$ . Thus, the total cost w.r.t. $S$ is

[TABLE]

This finishes the proof. $\square$

Corollary B.6.

Let $S$ be a $\varepsilon$ -coreset of $P$ , and $I=\{j:|S_{j}|\leq d/4\}$ . Then

[TABLE]

Proof:

Since $S$ is an $\varepsilon$ -coreset, we have by Lemma B.5

[TABLE]

Note that the above inequality holds for all $\ell\in[m]$ , then

[TABLE]

By the Cauchy-Schwartz inequality,

[TABLE]

Therefore, we have

[TABLE]

$\square$

Combining the above corollary with (16), we have

[TABLE]

which implies that

[TABLE]

So if we set $t=\frac{4\max\{1,(z/2)^{2}\}}{\min\{1,(z/2)^{2}\}}$ , then

[TABLE]

By the assumption $d\leq\frac{\min\{1,(z/2)^{2}\}}{100\varepsilon^{2}}$ , it holds that $|I|\leq\frac{2k}{5}$ or $|\bar{I}|\geq\frac{k}{2}-\frac{2k}{5}=\frac{k}{10}$ . Moreover, since $|S_{j}|>\frac{d}{t^{2}}$ for each $j\in\bar{I}$ , we have $|S|>\frac{d}{t^{2}}\cdot\frac{k}{5}=\frac{kd\min\{1,(z/2)^{4}\}}{\max\{1,(z/2)^{4}\}}$ .

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arnaboldi et al. [2012] Valerio Arnaboldi, Marco Conti, Andrea Passarella, and Fabio Pezzoni. Analysis of ego network structure in online social networks. 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing , pages 31–40, 2012.
2Arthur and Vassilvitskii [2007] David Arthur and Sergei Vassilvitskii. k 𝑘 k -means++: the advantages of careful seeding. In SODA , pages 1027–1035, 2007.
3Baker et al. [2020] Daniel N. Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for clustering in graphs of bounded treewidth. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event , volume 119 of Proceedings of Machine Learning Research , pages 569–579. PMLR, 2020.
4Banaszczyk [1998] Wojciech Banaszczyk. Balancing vectors and gaussian measures of n-dimensional convex bodies. Random Struct. Algorithms , 12(4):351–360, 1998.
5Bansal et al. [2019] Nikhil Bansal, Daniel Dadush, Shashwat Garg, and Shachar Lovett. The gram-schmidt walk: A cure for the banaszczyk blues. Theory Comput. , 15:1–27, 2019.
6Braverman et al. [2019] Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for ordered weighted clustering. In International Conference on Machine Learning , 2019.
7Braverman et al. [2021] Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for clustering in excluded-minor graphs and beyond. In SODA , pages 2679–2696. SIAM, 2021.
8Braverman et al. [2022] Vladimir Braverman, Vincent Cohen-Addad, Shaofeng Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, and Xuan Wu. The power of uniform sampling for coresets. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022 . IEEE Computer Society, 2022.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On Coresets for Clustering in Small Dimensional Euclidean Spaces

Abstract

Contents

1 Introduction

Question 1**.**

1.1 Problem Definitions and Previous Results

Euclidean kkk-Median.

Coresets.

Definition 1.1** (ε\varepsilonε-Coreset for Euclidean kkk-Median (Har-Peled and Mazumdar, 2004)).**

1.2 Our Results

1.3 Technical Overview

The Bucket-Partitioning Framework for 111-d kkk-Median in (Har-Peled and Kushal, 2005).

Reducing the Number of Buckets for 111-d 111-Median via Adaptive Cumulative Errors.

Hardness Result for 111-d 222-Median: Cumulative Error is Unavoidable.

Our Approaches when 2≤d≤ε−22\leq d\leq\varepsilon^{-2}2≤d≤ε−2.

1.4 Other Related Work

Coresets for Clustering in Metric Spaces

Coresets for Variants of Clustering

2 Tight Coreset Sizes for 111-d kkk-Median

2.1 Near Optimal Coreset for 111-d 111-Median

Theorem 2.1** (Improved Coreset for one-dimensional 111-Median).**

Useful Notations and Facts.

Observation 2.2**.**

Definition 2.3** **(Bucket).

Definition 2.4** **(**Mean and cumulative error

Fact 2.5**.**

Lemma 2.6** **(Cumulative error controls coreset error (Har-Peled and Kushal, 2005)).

Algorithm for Theorem 2.1.

Lemma 2.7** **(Number of blocks).

Proof:

Lemma 2.8** **(Number of buckets).

Proof:

Proof:

2.2 Tight Lower Bound on Coreset Size for 111-d kkk-Median when k≥2k\geq 2k≥2

Theorem 2.9** (Coreset lower bound for 111-d kkk-Median when k≥2k\geq 2k≥2).**

Lemma 2.10** **(Quadratic function cannot be approximated well by affine linear functions).

Proof:

Lemma 2.11** **(The function fP(c)f_{P}(c)fP​(c) is piecewise affine linear).

Proof:

Proof:

3 Improve Coreset Sizes when 2≤d≤ε−22\leq d\leq\varepsilon^{-2}2≤d≤ε−2

Definition 3.1** (ε\varepsilonε-Coreset for Euclidean (k,z)(k,z)(k,z)-Clustering (Har-Peled and Mazumdar, 2004)).**

3.1 Improved Coreset Size in Rd\mathbb{R}^{d}Rd when k=1k=1k=1

Theorem 3.2** (Coreset for Euclidean (1,z)(1,z)(1,z)-Clustering).**

Proof sketch:

3.1.1 Useful Notations and Facts

Reduction to mixed coreset.

Definition 3.3** (Mixed coreset for Euclidean (1,z)(1,z)(1,z)-Clustering).**

Theorem 3.4** **(Reduction from coreset to mixed coreset (Braverman et al., 2022)).

Class discrepancy.

Definition 3.5** **(Class discrepancy (Karnin and Liberty, 2019)).

Useful discrepancy results.

Theorem 3.6** **(An upper bound for class discrepancy (restatement of Theorem 18 of (Karnin and Liberty, 2019))).

3.1.2 Proof of Theorem 3.2

Lemma 3.7** (Class discrepancy for Euclidean (1,z)(1,z)(1,z)-Clustering).**

Proof:

Proof:

3.2 Improved Coreset Lower Bound in Rd\mathbb{R}^{d}Rd when k≥2k\geq 2k≥2

Theorem 3.8** **(Coreset lower bound in small dimensional spaces).

3.2.1 Preparation

Notations

Hard instance

Lemma 3.9**.**

Lemma 3.10**.**

3.2.2 Proof of Theorem 3.8 when z=2z=2z=2

Proof:

Lemma 3.11**.**

Proof:

Lemma 3.12**.**

Proof:

Lemma 3.13**.**

Proof:

Corollary 3.14**.**

Question 1.

Euclidean $k$ -Median.

Definition 1.1 ( $\varepsilon$ -Coreset for Euclidean $k$ -Median (Har-Peled and Mazumdar, 2004)).

The Bucket-Partitioning Framework for $1$ -d $k$ -Median in (Har-Peled and Kushal, 2005).

Reducing the Number of Buckets for $1$ -d $1$ -Median via Adaptive Cumulative Errors.

Hardness Result for $1$ -d $2$ -Median: Cumulative Error is Unavoidable.

Our Approaches when $2\leq d\leq\varepsilon^{-2}$ .

2 Tight Coreset Sizes for $1$ -d $k$ -Median

2.1 Near Optimal Coreset for $1$ -d $1$ -Median

Theorem 2.1 (Improved Coreset for one-dimensional $1$ -Median).

Observation 2.2.

Definition 2.3 (Bucket).

Definition 2.4 (**Mean and cumulative error

Fact 2.5.

Lemma 2.6 (Cumulative error controls coreset error (Har-Peled and Kushal, 2005)).

Lemma 2.7 (Number of blocks).

Lemma 2.8 (Number of buckets).

2.2 Tight Lower Bound on Coreset Size for $1$ -d $k$ -Median when $k\geq 2$

Theorem 2.9 (Coreset lower bound for $1$ -d $k$ -Median when $k\geq 2$ ).

Lemma 2.10 (Quadratic function cannot be approximated well by affine linear functions).

Lemma 2.11 (The function $f_{P}(c)$ is piecewise affine linear).

3 Improve Coreset Sizes when $2\leq d\leq\varepsilon^{-2}$

Definition 3.1 ( $\varepsilon$ -Coreset for Euclidean $(k,z)$ -Clustering (Har-Peled and Mazumdar, 2004)).

3.1 Improved Coreset Size in $\mathbb{R}^{d}$ when $k=1$

Theorem 3.2 (Coreset for Euclidean $(1,z)$ -Clustering).

Definition 3.3 (Mixed coreset for Euclidean $(1,z)$ -Clustering).

Theorem 3.4 (Reduction from coreset to mixed coreset (Braverman et al., 2022)).

Definition 3.5 (Class discrepancy (Karnin and Liberty, 2019)).

Theorem 3.6 (An upper bound for class discrepancy (restatement of Theorem 18 of (Karnin and Liberty, 2019))).

Lemma 3.7 (Class discrepancy for Euclidean $(1,z)$ -Clustering).

3.2 Improved Coreset Lower Bound in $\mathbb{R}^{d}$ when $k\geq 2$

Theorem 3.8 (Coreset lower bound in small dimensional spaces).

Lemma 3.9.

Lemma 3.10.

3.2.2 Proof of Theorem 3.8 when $z=2$

Lemma 3.11.

Lemma 3.12.

Lemma 3.13.

Corollary 3.14.

Appendix A Coreset Lower Bound for General $k$ -Median in $\mathbb{R}$

Appendix B Proof of Theorem 3.8 for General $z\geq 1$

Lemma B.1.

Lemma B.2.

Lemma B.3.

Lemma B.4.

Lemma B.5.

Corollary B.6.