Tight FPT Approximations for $k$-Median and $k$-Means

Vincent Cohen-Addad; Anupam Gupta; Amit Kumar; Euiwoong Lee; Jason Li

arXiv:1904.12334·cs.DS·April 30, 2019

Tight FPT Approximations for $k$-Median and $k$-Means

Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, Jason Li

PDF

TL;DR

This paper presents fixed-parameter tractable algorithms that achieve near-optimal approximation ratios for $k$-median and $k$-means clustering in metric spaces, and establishes hardness results indicating these ratios are essentially best possible under certain complexity assumptions.

Contribution

The authors develop FPT algorithms with improved approximation factors for $k$-median and $k$-means, and prove matching hardness bounds under complexity conjectures.

Findings

01

Achieved approximation ratios of (1+2/e+ε) for $k$-median and (1+8/e+ε) for $k$-means.

02

Established FPT hardness results showing no better ratios are possible under certain conjectures.

03

Provided insights into the complexity landscape of clustering problems in metric spaces.

Abstract

We investigate the fine-grained complexity of approximating the classical $k$ -median / $k$ -means clustering problems in general metric spaces. We show how to improve the approximation factors to $(1 + 2/ e + ε)$ and $(1 + 8/ e + ε)$ respectively, using algorithms that run in fixed-parameter time. Moreover, we show that we cannot do better in FPT time, modulo recent complexity-theoretic conjectures.

Equations97

cost (C, F) := j \in C \sum d (j, F) .

cost (C, F) := j \in C \sum d (j, F) .

j \in C^{'} \sum w_{j} d (j, F) \in (1 - ε, 1 + ε) \cdot j \in C \sum d (j, F),

j \in C^{'} \sum w_{j} d (j, F) \in (1 - ε, 1 + ε) \cdot j \in C \sum d (j, F),

|C^{\prime}|=O\Big{(}\frac{k\log n+\log\nicefrac{{1}}{{\delta}}}{\varepsilon^{2}}\Big{)}

|C^{\prime}|=O\Big{(}\frac{k\log n+\log\nicefrac{{1}}{{\delta}}}{\varepsilon^{2}}\Big{)}

\displaystyle(O(\varepsilon^{-2}\log n))^{k}\cdot(\log_{1+\varepsilon}\Delta)^{k}\leq\bigg{(}O\Big{(}\frac{(\log\Delta)(\log n)}{\varepsilon^{2}}\Big{)}\bigg{)}^{k}.

\displaystyle(O(\varepsilon^{-2}\log n))^{k}\cdot(\log_{1+\varepsilon}\Delta)^{k}\leq\bigg{(}O\Big{(}\frac{(\log\Delta)(\log n)}{\varepsilon^{2}}\Big{)}\bigg{)}^{k}.

F_{i} := {f \in F ∣ ⌈ ⌈ d (f, ℓ_{i}^{⋆})⌉ ⌉ = R_{i}^{⋆}},

F_{i} := {f \in F ∣ ⌈ ⌈ d (f, ℓ_{i}^{⋆})⌉ ⌉ = R_{i}^{⋆}},

d (j, f_{i}) \leq (△ ineq.) d (j, f) + d (f, ℓ_{i}^{⋆}) + d (ℓ_{i}^{⋆}, f_{i}) \leq d (j, f) + R_{i}^{⋆} + R_{i}^{⋆} = d (j, f) + d (f, f_{i}^{'}) = d (j, f_{i}^{'}) .

d (j, f_{i}) \leq (△ ineq.) d (j, f) + d (f, ℓ_{i}^{⋆}) + d (ℓ_{i}^{⋆}, f_{i}) \leq d (j, f) + R_{i}^{⋆} + R_{i}^{⋆} = d (j, f) + d (f, f_{i}^{'}) = d (j, f_{i}^{'}) .

cost (C, F^{'} \cup S) = j \in C^{'} \sum w_{j} d (j, F^{'} \cup S) = j \in C^{'} \sum w_{j} d (j, S) = cost (C, S),

cost (C, F^{'} \cup S) = j \in C^{'} \sum w_{j} d (j, F^{'} \cup S) = j \in C^{'} \sum w_{j} d (j, S) = cost (C, S),

d (j, f_{i}^{⋆}) \geq d (ℓ_{i}, f_{i}^{⋆}) \geq \frac{R _{i}^{⋆}}{1 + ε} .

d (j, f_{i}^{⋆}) \geq d (ℓ_{i}, f_{i}^{⋆}) \geq \frac{R _{i}^{⋆}}{1 + ε} .

d (j, F^{'})

d (j, F^{'})

\leq (\ref e q : c l aim 2.5 - l b) d (j, f_{i}^{⋆}) + 2 (1 + ε) d (j, f_{i}^{⋆}) \leq (3 + 2 ε) d (j, F^{⋆}),

improv (S^{⋆}) \geq (1 - 1/ e) improv (F^{⋆})

improv (S^{⋆}) \geq (1 - 1/ e) improv (F^{⋆})

cost (C, S^{⋆})

cost (C, S^{⋆})

\leq (\ref e q : 11) cost (C, F^{'}) - (1 - 1/ e) improv (F^{⋆})

= cost (C, F^{'}) - (1 - 1/ e) (cost (C, F^{'}) - cost (C, F^{⋆}))

= (1/ e) cost (C, F^{'}) + (1 - 1/ e) cost (C, F^{⋆})

\leq (Lem \leavevmode \nobreak \ref l e m : b o u n d - O P T) (3 + 2 ε) (1/ e) cost (C, F^{⋆}) + (1 - 1/ e) cost (C, F^{⋆})

= (1 + 2/ e + O (ε)) cost (C, F^{⋆}) .

∣ Σ_{U} ∣^{∣ U ∣^{O (1/ l o g (1/ η))}} = ∣ Σ_{U} ∣^{∣ U ∣^{1/2 r}} = ∣ Σ_{U} ∣^{ℓ^{1/2}} = 2^{O (m r / ℓ)},

∣ Σ_{U} ∣^{∣ U ∣^{O (1/ l o g (1/ η))}} = ∣ Σ_{U} ∣^{∣ U ∣^{1/2 r}} = ∣ Σ_{U} ∣^{ℓ^{1/2}} = 2^{O (m r / ℓ)},

∣ U ∣

∣ U ∣

∣ S ∣

k

2^{O ((n r / ℓ) \cdot k^{1/2 r})} = 2^{O ((n r / ℓ) \cdot ℓ)} = 2^{O (n r / ℓ)},

2^{O ((n r / ℓ) \cdot k^{1/2 r})} = 2^{O ((n r / ℓ) \cdot ℓ)} = 2^{O (n r / ℓ)},

d (f_{i}^{'}, v) = f \in F_{i} min (2 R_{i} + d (f, v)) \leq f \in F_{i} min (2 R_{i} + d (f, u) + d (u, v)) = d (f_{i}^{'}, u) + d (u, v),

d (f_{i}^{'}, v) = f \in F_{i} min (2 R_{i} + d (f, v)) \leq f \in F_{i} min (2 R_{i} + d (f, u) + d (u, v)) = d (f_{i}^{'}, u) + d (u, v),

d (u, f_{i}^{'}) + d (f_{i}^{'}, v) = f \in F_{i} min (2 R_{i} + d (u, f)) + f \in F_{i} min (2 R_{i} + d (f, v)) .

d (u, f_{i}^{'}) + d (f_{i}^{'}, v) = f \in F_{i} min (2 R_{i} + d (u, f)) + f \in F_{i} min (2 R_{i} + d (f, v)) .

d (u, f_{i}^{'}) + d (f_{i}^{'}, v) = 4 R_{i} + d (u, g) + d (h, v) \geq d (g, h) + d (u, g) + d (h, v) \geq d (u, v) .

d (u, f_{i}^{'}) + d (f_{i}^{'}, v) = 4 R_{i} + d (u, g) + d (h, v) \geq d (g, h) + d (u, g) + d (h, v) \geq d (u, v) .

cost (F^{'} \cup T)

cost (F^{'} \cup T)

⟹ (Step \leavevmode \nobreak \ref l in e : im p r) improv (S) \leq improv (T),

d (j, F^{'} \cup S) - d (j, F^{'} \cup (S \cup {f}))

d (j, F^{'} \cup S) - d (j, F^{'} \cup (S \cup {f}))

= max (0, d (j, F^{'} \cup S) - d (j, {f}))

\geq max (0, d (j, F^{'} \cup T) - d (j, {f}))

= d (j, F^{'} \cup T) - min (d (j, F^{'} \cup T), d (j, {f}))

= d (j, F^{'} \cup T) - d (j, F^{'} \cup (T \cup {f})),

improv (S \cup {f}) - improv (S)

improv (S \cup {f}) - improv (S)

= j \in C^{'} \sum w_{j} (d (j, F^{'} \cup S) - d (j, F^{'} \cup (S \cup {f})))

\geq j \in C^{'} \sum w_{j} (d (j, F^{'} \cup T) - d (j, F^{'} \cup (T \cup {f})))

= cost (F^{'} \cup T) - cost (F^{'} \cup (T \cup {f}))

= improv (T \cup {f}) - improv (T),

cost_{I} (C, F) \leq cost_{I^{'}} (C, F) \leq α O P T (I^{'}) \leq α (1 + o (1)) O P T (I) .

cost_{I} (C, F) \leq cost_{I^{'}} (C, F) \leq α O P T (I^{'}) \leq α (1 + o (1)) O P T (I) .

j \in C^{'} \sum w_{j} d (j, F^{'}) \leq 2 j \in C^{'} \sum w_{j} d (j, F^{⋆}) \leq 2 (1 + ε) O P T (I) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Université Pierre et Marie Curie, Paris Carnegie Mellon UniversitySupported in part by NSF awards CCF-1536002, CCF-1540541, and CCF-1617790. IIT Delhi New York UniversitySupported in part by the Simons Collaboration on Algorithms and Geometry. Carnegie Mellon UniversitySupported in part by NSF awards CCF-1536002, CCF-1540541, and CCF-1617790.

\CopyrightVincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li \ccsdesc[500]Theory of computation Facility location and clustering \ccsdesc[500]Theory of computation Fixed parameter tractability \ccsdesc[300]Theory of computation Submodular optimization and polymatroids

Acknowledgements.

We thank Deeparnab Chakrabarty, Ola Svensson, and Pasin Manurangsi for useful discussions. This research was partially conducted when A. Kumar was visiting A. Gupta and Carnegie Mellon University as part of the Joint Indo-US Virtual Center for Algorithms under Uncertainty.

\EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23

Tight FPT Approximations for $k$ -Median and

$k$ -Means

Vincent Cohen-Addad

Anupam Gupta

Amit Kumar

Euiwoong Lee

Jason Li

Abstract

We investigate the fine-grained complexity of approximating the classical $k$ -Median/ $k$ -Means clustering problems in general metric spaces. We show how to improve the approximation factors to $(1+2/e+\varepsilon)$ and $(1+8/e+\varepsilon)$ respectively, using algorithms that run in fixed-parameter time. Moreover, we show that we cannot do better in FPT time, modulo recent complexity-theoretic conjectures.

keywords:

approximation algorithms, fixed-parameter tractability, k-median, k-means, clustering, core-sets

1 Introduction

How well can we approximate the $k$ -Median and $k$ -Means clustering problems? This question has been intensively studied over the past two decades, and many interesting algorithmic techniques have been developed and refined in an attempt to understand these problems. Let us elaborate for the $k$ -Median problem; the story for $k$ -Means is much the same. Recall that in the $k$ -Median problem, given a metric space $(V,d)$ with $n$ points and clients at some of the points, the goal is to open $k$ facilities such that the sum of distances from the clients to their closest facilities is minimized.

The first constant-factor approximation algorithm for $k$ -Median was given by Charikar et al. [6]. After many interesting developments (e.g., primal-dual schemes, sophisticated LP rounding schemes, and pseudo-approximations), today the best approximation guarantee is 2.611 [3]. The best lower bound, however, is still the $(1+2/e)$ -hardness from 1998, due to Guha and Khuller [16]. In this paper, we ask: can we do better if we give ourselves more resources? The problem can be solved exactly by brute-force enumeration in time $n^{k+O(1)}$ , but what can we do, say, in FPT time $f(k)n^{O(1)}$ ?

We cannot hope to solve the problem exactly in FPT time: the reduction of Guha and Khuller also shows a $W[2]$ -hardness for finding the optimal solution for $k$ -Median/ $k$ -Means exactly. Naturally, we then ask what we can achieve by combining the two approaches together, and whether good approximation algorithms can be given in FPT time.

Our Results. Our main algorithmic result is a positive result in this direction:

Theorem 1.1 (Algorithm for $k$ -Median/ $k$ -Means).

For every $\varepsilon>0$ , there is a $(1+2/e+\varepsilon)$ -approximation algorithm for the $k$ -Median problem, that runs in time FPT time, i.e., in $f(k,\varepsilon)n^{O(1)}$ time. For the $k$ -Means problem, we can achieve a $(1+8/e+\varepsilon)$ -approximation in the same runtime.

The approximation guarantees in Theorem 1.1 match the NP-hardness results for the two problems implied by [16]. However, since we are allowing ourselves FPT time and not just $\operatorname{poly}(n,k)$ time, can we do even better and go past this NP-hardness barrier? Our second main result shows that this is not possible, at least under recent complexity-theoretic conjectures. We prove that the results in Theorem 1.1 are essentially tight, assuming the Gap-Exponential Time Hypothesis [12, 25, 5]:

Theorem 1.2 (Hardness).

There exists a function $g:{\mathbb{R}}^{+}\to{\mathbb{R}}^{+}$ such that assuming the Gap-ETH, for any $\varepsilon>0$ , any $(1+2/e-\varepsilon)$ -approximation algorithm for $k$ -Median, and any $(1+8/e-\varepsilon)$ -approximation for $k$ -Means, must run in time at least $n^{k^{g(\varepsilon)}}$ .

The basic component of the above hardness result is an FPT-hardness of a factor of $(1-1/e)$ for the Max $k$ -Coverage problem, again using the Gap-ETH (Theorem 3.1). Composing that hardness result with the reduction of Guha and Khuller [16] gives us Theorem 1.2 above.

Matroid Median. Finally, using our algorithmic techniques, we are able to also give an improved approximation for the matroid-median problem, which is a generalization of the $k$ -Median problem.

Theorem 1.3 (Algorithm for Matroid Median).

There is a $(2+\varepsilon)$ -approximation algorithm for the Matroid Median problem, that runs in time FPT time, i.e., in $f(k,\varepsilon)n^{O(1)}$ time.

Since the Matroid Median problem is a generalization of the $k$ -Median problem, the $(1+2/e-\varepsilon)$ -hardness from Theorem 1.2 translates immediately to Matroid Median. It remains an open problem to close the gap between this lower bound and the $(2+\varepsilon)$ -approximation in Theorem 1.3. We can also use our ideas to get an $(3-\frac{2}{p+1}+\varepsilon)$ for the $p$ -Matroid Median problem.

Facility Location. Facility Location is a problem closely related to $k$ -Median, where each facility has an opening cost and the goal is to open facilities to minimize the sum of distances from clients to their closest facilities plus the sum of the total opening costs. For this problem, the best known hardness ratio is ${\alpha_{\mathsf{FL}}}\approx 1.463$ [16], which is defined to be $\max_{x\geq 0}\big{(}1+\frac{x}{1+x}\ln\frac{2}{x}\big{)}$ . On the other hand, the best algorithm achieves an $1.488$ -approximation [22]. When the parameter $k$ denotes the number of facilities open in the optimal solution, we prove that our techniques also give an FPT algorithm for Facility Location whose approximation ratio matches the hardness ratio of [16].

Theorem 1.4 (Algorithm for Facility Location).

There is a $({\alpha_{\mathsf{FL}}}+\varepsilon)$ -approximation algorithm for the Facility Location problem, that runs in time FPT time, i.e., in $f(k,\varepsilon)n^{O(1)}$ time.

Roadmap: In Section 2, we describe the approximation algorithms for these problems. We assume throughout that the aspect ratio is polynomially bounded. (We show in Section B.1 that this assumption is without loss of generality, in the case we consider where the clients have unit weights.) In Section 3, we then give the hardness results for FPT Max $k$ -Coverage, $k$ -Median, and $k$ -Means.

1.1 Our Techniques

The algorithm is inspired by the hardness result from [16]: it relies on the result of Feige [13] that Max $k$ -Coverage is hard to approximate better than $(1-1/e)$ . Hence, if we build a “factor graph” with sets on one side and elements on another, with edges indicating inclusion, picking $k$ sets covers $(1-1/e)$ elements at distance $1$ , and the remaining at distance at least $3$ —hence $1+2/e$ . Now what if we have a general instance, with different distances? We show how to do limited enumeration (in FPT) time to restrict our choices to picking one facility each from $k$ disjoint sets. Moreover, via a surprisingly clean idea we can model the objective as submodular maximization (subject to a partition matroid constraint). And this problem can be approximated well: the factor again is $(1-1/e)$ , hence giving the same factor upto additive $\varepsilon$ terms!

The matching hardness result is via showing an FPT hardness for Max $k$ -Coverage assuming the Gap-ETH. Firstly, we show that assuming the Gap-ETH, there is no FPT approximation algorithm for Label Cover problem parameterized by the number of vertices $k$ on one side of the bipartition. (Trying all labelings on one side takes time $O(n^{k+O(1)})$ , and doing much better is hard.) To do this, we construct a variable-clause game from a $3$ -SAT instance, merge clause vertices into $\ell$ super-vertices, and then use $r$ rounds of parallel repetition. (The number of clause vertices becomes $k:=\ell^{r}$ .) Then we compose this with the classical reduction from Label Cover to Max $k$ -Coverage [13]. Due to some technical details (e.g., our Label Cover instance is not guaranteed to be regular) and for the sake of completeness, we provide a formal proof in Lemma 3.5. While our techniques are similar to recent FPT hardnesses for the related $k$ -Dominating Set problem [5, 10], some technical details (e.g., the projection property of Label Cover instances) prevent us from directly using prior results to get $(1-1/e+\varepsilon)$ -hardness for Max $k$ -Coverage.

1.2 Related Work

We briefly survey the state-of-the-art for $k$ -Median and $k$ -Means; please see references below for more historical context. For general metric spaces, the best approximation ratio for $k$ -Median is 2.611 [3] by Byrka et al., building on work of Li and Svensson [23]. Kanungo et al. [17] gave a $(9+\varepsilon)$ -approximation algorithm for $k$ -Means in general metric spaces, which was later improved to 6.357 by Ahmadian et al. [1]. The first constant factor approximation algorithm for Matroid Median was given by Krishnaswamy et al. [18], which was improved by Swamy [27] to 8.

For Euclidean spaces, the problems are better approximable, at least when either $k$ or the dimension $d$ are fixed; we restrict this discussion to parameterizing by $k$ . Specifically, PTASs for both $k$ -Median and $k$ -Means with running time $f(k,\varepsilon)\operatorname{poly}(n,d)$ were given by Kumar et al. [19]. The running times were improved by Chen [7] to $O(nkd+d^{2}n^{\sigma}2^{(k/\varepsilon)^{O}(1)})$ for any $\sigma>0$ for $k$ -Median, and by Feldman et al. [15] to $O(nkd+d\operatorname{poly}(k/\varepsilon)+2^{{\tilde{O}}(k/\varepsilon)})$ for $k$ -Means. Both these latter results were based on the notion of coresets. The $k$ -Means problem is APX hard even in Euclidean space, if both $k$ and $d$ are allowed to be arbitrary [2, 20].

A result of direct interest to this work is that of Czumaj and Sohler [11], for the min-sum clustering problem. They give a $(4+\varepsilon)$ -approximation on general metrics in FPT time. They construct a small (strong) core-set for the related Balanced $k$ -Median problem, and enumerate over all choices of centers inside this core-set. We show in §B.2 that their approach extends to give a $2$ -approximation for the non-bipartite case of $k$ -Median— in this special case of $k$ -Median a facility may be opened at any client location, and hence $C\subseteq{\mathbb{F}}$ . Theorem 1.1 above shows how to get a better guarantee for a more general case. (As an aside, the hardness for this special non-bipartite case is only $(1+1/e)$ ; closing this gap is another interesting open question.)

Hardness-of-approximation results for parameterized problems have been actively studied recently. Lin [24] proved $W[1]$ -hardness of approximation for $k$ -Biclique. Chen and Lin [9] proved $W[1]$ -hardness of approximation for $k$ -Dominating Set in any constant factor, which was later improved to any function $f(k)$ in [5, 10]. Chalermsook et al. [5] also proved that there is no FPT $o(k)$ -approximation algorithm for $k$ -Clique assuming the Gap-ETH.

1.3 Preliminaries

An instance ${\cal I}$ of the $k$ -Median problem is defined by a tuple $((V,d),C,{\mathbb{F}},k)$ , where $(V,d)$ is a metric space over a set of points $V$ with $d(i,j)$ denoting the distance between two points $i,j$ in $V$ . Further, $C$ and ${\mathbb{F}}$ are subsets of $V$ and are referred as “clients” and “facility locations”, and $k$ is a positive parameter. The goal is to find a subset $F$ of $k$ facilities in ${\mathbb{F}}$ to minimize

[TABLE]

In the weighted version of $k$ -Median, every client $j\in C$ has an associated weight $w_{j}$ , and the goal is to find a subset $F$ of ${\mathbb{F}}$ of size $k$ such that $\small{\mathsf{cost}}(C,F):=\sum_{j\in C}w_{j}d(j,F)$ is minimized.

The $k$ -Means problem is defined similarly except that the objective function gets modified to $\small{\mathsf{cost}}(C,F):=\sum_{j\in C}d(j,F)^{2}$ (and analogously for the weighted version). The names of the two problems come from the fact that if the metric space is the real line and $k=1$ , the optimal solution is the median and the mean respectively. In the Matroid Median problem, we are given a matroid on the set ${\mathbb{F}}$ , and the set of open facilities must be an independent set in the matroid. Again, the goal is to minimize the assignment cost of clients to the nearest open facility.

In the Facility Location problem, an instance is not given $k$ , but additionally has $\small{\mathsf{open}}:{\mathbb{F}}\to{\mathbb{R}}^{+}$ that indicates the opening cost of each facility. The goal is to find a subset $F\subseteq{\mathbb{F}}$ (without any restriction on $|F|$ ) that minimizes $\small{\mathsf{open}}(F)+\small{\mathsf{cost}}(C,F)$ where $\small{\mathsf{open}}(F):=\sum_{f\in F}\small{\mathsf{open}}(f)$ .

Finally, the aspect ratio of a metric space $(V,d)$ is $\Delta:=\frac{\max_{x,y\in V}d(x,y)}{\min_{x,y\in V}d(x,y)}$ .

2 The Approximation Algorithm

We now give the $(1+2/e+\varepsilon)$ -approximation algorithm for $k$ -Median, where $\varepsilon>0$ is a fixed parameter throughout this section. The running time of the algorithm is $f(k,\varepsilon)\cdot\operatorname{poly}(n)$ , where $f(k,\varepsilon)=O(\varepsilon^{-2}k\log k)^{k}$ . We then indicate the alterations to get algorithms for $k$ -Means and Matroid Median.

2.1 The Intuition

We focus on $k$ -Median for now; the ideas for the other problems are analogous. The first idea is to reduce the size of the client set $C$ to $O(\varepsilon^{-2}k\log n)$ —this can be done by results on core-sets for $k$ -Median, which consolidate the clients into a small number of distinct locations [8, 14]. The consolidated clients now have weights, but this extension to weighted $k$ -Median does not pose a problem.

The next idea is to carefully enumerate over the structure of an optimal solution. Consider an optimal solution $F^{\star}=\{f^{\star}_{1},\ldots,f^{\star}_{k}\}$ . For a facility $f^{\star}_{i}\in F^{\star}$ , let “cluster” $C^{\star}_{i}$ be the clients assigned to $f^{\star}_{i}$ , i.e., the subset of clients $C$ for which $f^{\star}_{i}$ is closest open facility. Let $\ell_{i}$ be the client in $C^{\star}_{i}$ closest to $f^{\star}_{i}$ – we call it the leader of cluster $C^{\star}_{i}$ . Let $R_{i}$ be the distance $d(f^{\star},\ell_{i})$ , suitably discretized. Our algorithm guesses the leaders $\ell_{i}$ and the distances $R_{i}$ for each $i\in[k]$ . Since the size of $C$ is $O(\varepsilon^{-2}k\log n)$ , there are $(O(\varepsilon^{-2}k\log n))^{k}$ choices for leaders,111Our analysis will tighten this bound to $O(\varepsilon^{-2}\log n)^{k}$ , but this improvement can be ignored for this intuition section. and a similar number of choices for the distances; moreover, this quantity can be shown to be $f(k,\varepsilon)\cdot n^{O(1)}$ .

Assume now that we have correctly guessed the leaders $\ell_{i}$ and distances $R_{i}$ . For each leader $\ell_{i}$ , let $F_{i}$ be the facilities at distance about $R_{i}$ from $\ell_{i}$ —this set $F_{i}$ contains $f^{\star}_{i}$ . By making copies, assume the sets $F_{i}$ are disjoint. Now our task is to select one facility from each set $F_{i}$ such that the total (weighted) assignment cost of the clients in $C$ is minimized. As such, this seems like a decreasing supermodular minimization problem with a (partition) matroid constraint. (Observe that choosing an arbitrary center in each $F_{i}$ gives us a $3$ -approximation in FPT time, but we want to do much better.)

The last idea is to convert this into a monotone submodular maximization problem, again with a partition matroid constraint. For each set $F_{i}$ , we add a fictitious facility $f_{i}^{\prime}$ such that (i) the assignment cost of clients to the fictitious facilities is at most $3OPT$ , and (ii) for a subset $S$ of facilities, the “improvement” $\small{\mathsf{cost}}(C,F^{\prime})-\small{\mathsf{cost}}(C,F^{\prime}\cup S)$ , where $F^{\prime}$ is the set of fictitious facilities, is a monotone submodular function. We finally show that a $(1-1/e)$ -approximation for this submodular maximization problem gives the desired approximation guarantee. The next two sections describe the algorithm for $k$ -Median in detail. The extension to $k$ -Means, Matroid Median and Facility Location then appears in §C.

2.2 Client Reduction via Coresets

Consider an instance ${\cal I}=((V,d),C,{\mathbb{F}},k)$ of the $k$ -Median problem. Let $\varepsilon>0$ be a fixed constant. We now define the notion of core-sets and use known results to reduce the size of $C$ to (a weighted) a set of size $O(\varepsilon^{-2}k\log n)$ .

Definition 2.1 (Core-set).

A (strong) core-set for ${\cal I}$ is a set of clients $C^{\prime}\subseteq V$ along with weights $w_{j}$ for all $j\in C^{\prime}$ , such that

[TABLE]

for every $F\subseteq{\mathbb{F}}$ with $|F|=k$ .

A similar definition holds for a strong core-set for the $k$ -Means problem. Since we deal only with strong core-sets in this paper, we drop the modifier and refer to them only as core-sets. The first core-sets for metric $k$ -Median were given by Chen [8]; the following result is the best current construction:

Theorem 2.2 ([14], Theorem 15.4).

For $0\leq\varepsilon,\delta\leq 1/2$ , there exists a Monte Carlo algorithm that for each instance $I$ of $k$ -Median on a general metric, outputs a core-set $C^{\prime}\subseteq C$ with size

[TABLE]

with probability $1-\delta$ , where $n=|V|$ . Moreover, the algorithm runs in time $O(k(n+k)+\log^{2}(1/\delta)\log^{2}n)$ . For $k$ -Means, the core-set is of size $|C^{\prime}|=O\big{(}\frac{k\log n+\log\nicefrac{{1}}{{\delta}}}{\varepsilon^{4}}\big{)}$ , and the runtime remains the same.

The power of core-sets lies in the following fact.

Fact 1.

*Consider a $k$ -Median/ $k$ -Means instance ${\cal I}=((V,d),C,{\mathbb{F}},k)$ , and let $C^{\prime}$ be a (strong) core-set with weights $w$ . Consider the weighted instance ${\cal I}^{\prime}=((V,d),C^{\prime},{\mathbb{F}},k,w)$ , which is the instance ${\cal I}$ with its clients replaced by the weighted clients in the core-set. Then, for any $\beta\geq 1$ , a $\beta$ -approximate solution $F\subseteq{\mathbb{F}}$ to ${\cal I}^{\prime}$ is a $\beta(1+O(\varepsilon))$ -approximate solution to ${\cal I}$ . *

Therefore, in order to find a $(1+2/e+O(\varepsilon))$ -approximation to a $k$ -Median ${\cal I}$ , it suffices to find a $(1+2/e+O(\varepsilon))$ -approximation to ${\cal I}^{\prime}$ , and analogously for $k$ -Means. Henceforth, we restrict our attention to the core-set instance ${\cal I}^{\prime}$ . In other words, we assume that our instances have only a small number of clients, but now the clients have associated weights. In the following sections, we show how to approximate such weighted $k$ -Median/ $k$ -Means instances in FPT time.

2.3 Reduction to Submodular Maximization

Given Fact 1, we only consider instances ${\cal I}=((V,d),C,{\mathbb{F}},k,w)$ of weighted $k$ -Median, where clients in $C$ have weights in the range $[1,n]$ and $|C|$ is bounded by $O(\varepsilon^{-2}k\log n)$ . In this section we prove the following approximation guarantee for $k$ -Median; this, combined with Fact 1, proves the $k$ -Median statement in Theorem 1.1.

Theorem 2.3.

Let $\varepsilon$ be a fixed parameter. Given a $k$ -Median instance ${\cal I}=((V,d),C^{\prime},{\mathbb{F}},k,w)$ with $|C^{\prime}|=O(\varepsilon^{-2}k\log n)$ , there is a $(1+2/e+O(\varepsilon))$ -approximation algorithm that runs in $f(k,\varepsilon)n^{O(1)}$ time.

By scaling, assume the minimum distance between points in $V$ is 1, so the aspect ratio $\Delta$ is the maximum distance between two points in $V$ . For a positive integer $a$ , define $\lceil\!\!\lceil a\rceil\!\!\rceil:=(1+\varepsilon)^{\lceil\log_{(1+\varepsilon)}a\rceil}$ as the smallest power of $(1+\varepsilon)$ larger than or equal to $a$ . Here, $\varepsilon$ is the same fixed parameter as the one used in the core-set.

The formal algorithm follows the intuition in §2.1 and is described in Algorithm 2.1; let us step through it now. We iterate over all possible values $\ell_{1},\ldots,\ell_{k}$ for the leaders, and $R_{1},\ldots,R_{k}$ for the corresponding distances. The same vertex could appear several times in the subset $\{\ell_{1},\ldots,\ell_{k}\}$ , and so the latter should be thought of as a multi-set. In Step 7, we add $k$ new fictitious facilities: for each $i$ , the new facility $f_{i}^{\prime}$ is at distance $2R_{i}$ from all the facilities in $F_{i}$ . The distance to all other points is determined by triangle inequality in Step 8. Claim 2 shows that this forms a valid metric. In Step 9, we define the “improvement” function $\small{\mathsf{improv}}(S)$ as the reduction in cost due to adding in the facilities in $S$ . Claim 3 shows this function is monotone submodular. This means we can use the $(1-1/e)$ -approximation algorithm [4] for monotone submodular function maximization subject to a matroid constraint to find a set $S$ which contains exactly one facility from each of the sets $F_{i}$ , since this is a partition matroid constraint. Observe that the function $\small{\mathsf{improv}}(\cdot)$ can be computed efficiently. This completes the description of the algorithm.

To prove correctness of the algorithm, we need to show two things: the distance function defined on $F^{\prime}\cup V$ in Step 8 is a metric, and the function $\small{\mathsf{improv}}$ defined in Step 9 is monotone and submodular. We defer the simple proofs to §A.

Claim 2 (Metricity).

Consider the set $F^{\prime}$ defined during an iteration of the algorithm. The distance function defined on $F^{\prime}\cup V$ is a metric.

Claim 3 (Submodularity).

The function $\small{\mathsf{improv}}(S)$ defined in Step 9 is monotone and submodular with $\small{\mathsf{improv}}(\varnothing)=0$ .

Now to bound the runtime. Since $|C|=O(\varepsilon^{-2}k\log n)$ , there are at most $\binom{O(\varepsilon^{-2}k\log n)+k-1}{k}=(O(\varepsilon^{-2}\log n))^{k}$ different multi-sets of size $k$ with elements in $C$ . In addition, there are $\log_{1+\varepsilon}\Delta$ many choices for $R_{i}$ for each $i\in[k]$ . Therefore, the number of iterations in Step 1 of the algorithm can be bounded by

[TABLE]

As argued in §B.1, since we started with the unweighted $k$ -Median problem, the aspect ratio $\Delta$ can be assumed to polynomially bounded in $n$ , and so the number of iterations can be bounded by $(O(\log n/\varepsilon^{2}))^{k}$ , which is at most $n\cdot(O(\varepsilon^{-2}k\log k))^{k}$ . Indeed, in case $k<\frac{\log n}{\log\log n}$ , $(O(\log n/\varepsilon^{2}))^{k}\leq(O(1/\varepsilon^{2}))^{k}\cdot\smash{(\log n)^{\frac{\log n}{\log\log n}}}=(O(1/\varepsilon^{2}))^{k}\cdot n$ . Else $\log n\leq O(k\log k)$ , and hence $(O(\log n/\varepsilon^{2}))^{k}=(O(k\log k/\varepsilon^{2}))^{k}$ .

The algorithm for submodular maximization subject to a matroid constraint takes polynomial time, given a value oracle for the function [4, Theorem 1.1]: in fact it can be sped up for the case of partition matroid constraints [4, §3.3]. The value oracle for $\small{\mathsf{improv}}(S)$ can itself be implemented in polynomial time. Hence each iteration of the algorithm can be run in time polynomial in $n$ .

The submodular maximization algorithm is a randomized Monte-Carlo algorithm that succeeds with only probability $1-1/n^{2}$ , but we can easily boost the success probability by repetition: by running it $\tau:=\operatorname{poly}(\varepsilon^{-1}k\log n)$ times for each input $S$ and returning the maximum value obtained, we can ensure that with high probability it succeeds in all the calls we make.

2.3.1 Approximation Ratio

We now argue about the approximation ratio of the algorithm. We fix an optimal solution to the instance. Let $F^{\star}=\{f^{\star}_{1},\ldots,f^{\star}_{k}\}$ be the centers opened by this solution. Define $C^{\star}_{i}$ as the clients for which the closest open center is $f^{\star}_{i}$ , i.e., $C^{\star}_{i}:=\{j\in C:d(j,f_{i}^{\star})=d(j,F^{\star})\}$ . We define the notion of leaders with respect to this solution.

Definition 2.4 (Leader).

*For each $i\in[k]$ , call a client $j\in C^{\star}_{i}$ that minimizes $d(j,f^{\star}_{i})$ over all $j\in C^{\star}_{i}$ the leader $\ell^{\star}_{i}$ of center $f^{\star}_{i}$ . If there are multiple clients $j\in C^{\star}_{i}$ achieving the minimum, declare an arbitrary one to be the leader. Note that a client can be the leader of multiple centers $f^{\star}_{i}$ . The leaders w.r.t. the solution $F^{\star}$ is the multi-set $\{\ell^{\star}_{1},\ldots,\ell^{\star}_{k}\}$ . For each leader $\ell^{\star}_{i}$ , the radius $R^{\star}_{i}$ is defined as $\lceil\!\!\lceil d(\ell^{\star}_{i},f^{\star}_{i})\rceil\!\!\rceil.$ *

Consider the iteration of Algorithm 2.1 where $\ell_{1},\ldots,\ell_{k}$ are equal to $\ell^{\star}_{1},\ldots,\ell_{k}^{\star}$ respectively, and $R_{1},\ldots,R_{k}$ are equal to $R^{\star}_{1},\ldots,R^{\star}_{k}$ respectively. Let $S^{\star}$ be the set output in Step 10 of the algorithm. It suffices to show that $\small{\mathsf{cost}}(C,S^{\star})\leq(1+2/e+\varepsilon)\small{\mathsf{cost}}(C,F^{\star})$ . We proceed to show this in the rest of the section.

As in the algorithm, define

[TABLE]

so that $f^{\star}_{i}\in F_{i}$ for each $i\in[k]$ . (Recall that the sets $F_{i}$ are disjoint by duplicating facilities.) Let $F^{\prime}=\{f_{1}^{\prime},\ldots,f_{k}^{\prime}\}$ be the set of fictitious facilities defined in the algorithm.

We are interested in the solutions $S$ that consist of one center from each $F_{i}$ , since one such solution is the desired $F^{\star}$ . More formally, define a solution $S$ to be valid if the set $S$ can be listed as $(f_{1},\ldots,f_{k})$ so that $f_{i}\in F_{i}$ for each $i\in[k]$ .

Claim 4.

For every valid $S$ , $\small{\mathsf{cost}}(C,F^{\prime}\cup S)=\small{\mathsf{cost}}(C,S)$ .

Proof 2.5.

List the set $S$ as $(f_{1},\ldots,f_{k})$ , where $f_{i}\in F_{i}$ for each $i\in[k]$ . Informally, this claim amounts to showing that the fictitious facilities $F^{\prime}$ do not improve the solution $S$ . To formalize this idea, fix a client $j\in C$ and a fictitious facility $f^{\prime}_{i}$ , and let $f\in F_{i}$ be a closest center to $j$ in $F_{i}$ . Below, we show that in fact, client $j$ is closer to $f_{i}\in S$ than to $f_{i}^{\prime}\in F^{\prime}$ :

[TABLE]

Therefore, we have $d(j,F^{\prime})\geq d(j,S)$ for all clients $j$ , so

[TABLE]

as desired.

We now bound the cost of the solution which opens facilities at $F^{\prime}$ .

Claim 5.

$\small{\mathsf{cost}}(C,F^{\prime})\leq(3+2\varepsilon)\,\small{\mathsf{cost}}(C,F^{\star})$ .

Proof 2.6.

It suffices to show that $d(j,F^{\prime})\leq(3+2\varepsilon)\,d(j,F^{\star})$ for each client $j\in C$ . Fix a client $j\in C$ , and let $f^{\star}_{i}\in F^{\star}$ be a center achieving $d(j,f^{\star}_{i})=d(j,F^{\star})$ . Since $\ell^{\star}_{i}$ is the leader of center $f^{\star}_{i}$ , we have

[TABLE]

Recall that $f^{\star}_{i}\in F_{i}$ . Therefore,

[TABLE]

as desired.

Let $S^{\star}$ be the set output in Step 11. Since the algorithm of [4] is $(1-1/e)$ -approximation,

[TABLE]

Lemma 2.7.

The solution $S^{\star}$ in (2.3) satisfies $\small{\mathsf{cost}}(C,S^{\star})\leq(1+2/e+O(\varepsilon))\;\small{\mathsf{cost}}(C,F^{\star})$ .

Proof 2.8.

We bound the cost associated with this solution as follows.

[TABLE]

Hence the proof.

2.3.2 Putting it all together

Our algorithm is a Monte Carlo randomized algorithm: both our subroutines use randomness. The first is the core-set construction in §2.2, and the second is the submodular maximization procedure in Step 10 of the algorithm. For each, we can make the error probability $1/\operatorname{poly}(n)$ . Since each iteration of the algorithm can be implemented in $\operatorname{poly}(n)$ time, the runtime is dominated by the number of iterations, which is $(O(\varepsilon^{-2}k\log k)^{k}\operatorname{poly}(n))$ . Moreover, combining the two steps of finding the core-set and the submodular maximization, the approximation ratio is $(1+\varepsilon)(1+2/e+O(\varepsilon))=1+2/e+O(\varepsilon).$ This proves Theorem 1.1 for the $k$ -Median problem.

3 Gap-ETH Hardness of Max $k$ -Coverage

In this section, we show that assuming the Gap Exponential Time Hypothesis (Gap-ETH) [12, 25], for any $\varepsilon>0$ , there is no FPT-approximation algorithm that approximates Max $k$ -Coverage better than a factor $(1-1/e+\varepsilon)$ .

Theorem 3.1 (Hardness for Max-Coverage).

There exists a function $g:{\mathbb{R}}^{+}\to{\mathbb{R}}^{+}$ such that assuming the Gap-ETH, for any $\varepsilon>0$ , any $(1-1/e+\varepsilon)$ -approximation algorithm for Max $k$ -Coverage with $n$ elements and $m$ sets must run in time at least $(n+m)^{k^{g(\varepsilon)}}$ .

Using the reduction of Guha and Khuller [16], this immediately implies Theorem 1.2. The rest of the section is devoted to the proof of Theorem 3.1. The proof has two main components: the first part shows under the Gap-ETH, it takes at least $n^{h(k)}$ time to approximate the Label Cover problem even when one side of the bipartition has only $k$ vertices; here $h(\cdot)$ is some increasing function depending on the quality of approximation. This reduction is inspired by the recent progress on the hardness of parameterized problems [5, 10] and was communicated to us by Pasin Manurangsi. The second part is the classical reduction from Label Cover to Max $k$ -Coverage given by Feige [13].

3.1 Hardness of Label Cover from Gap-ETH

We begin with the standard definition of Label Cover.

Definition 3.2 (Label Cover).

An instance of Label Cover $\mathcal{L}$ consists of a bipartite graph $G=(U\cup V,E)$ with possibly parallel edges, two label sets $\Sigma_{U},\Sigma_{V}$ , and a projection $\pi_{e}:\Sigma_{U}\to\Sigma_{V}$ for each $e\in E$ . Given a labeling $\sigma:(U\cup V)\to(\Sigma_{U}\cup\Sigma_{V})$ , an edge $e=(u,v)\in E$ is satisfied when $\pi_{e}(\sigma(u))=\sigma(v)$ . The goal of Label Cover is to find a labeling $\sigma$ that maximizes the number of satisfied edges. Let $\mathsf{OPT}(\mathcal{L})$ be the maximum fraction of edges simultaneously satisfied by any labeling.

Note that we include the projection property in the definition; all Label Cover instances in the paper will have this property. For a vertex $u\in U\cup V$ , let $d_{u}$ be the degree of $u$ , and let $d_{U}$ (resp. $d_{V}$ ) be the maximum degree of $U$ (resp. $V$ ). We also call an instance $U$ -regular (resp. $V$ -regular) if all vertices in $U$ (resp. $V$ ) have the same degree. All subsequent Label Cover instances will be $U$ -regular, though the lack of $V$ -regularity will require us to do a little more work in §3.2.

Given a 3-SAT formula $\phi$ , let $\mathsf{OPT}(\phi)$ be the maximum fraction of clauses that can be satisfied by any assignment. The Gap-ETH [12, 25] states that there exist some constants $\delta>0,s<1$ for which no algorithm, given a 3-SAT formula $\phi$ on $n$ variables and $m=O(n)$ clauses, can distinguish whether $\mathsf{OPT}(\phi)=1$ or $\mathsf{OPT}(\phi)<s$ in time $O(2^{\delta n})$ . The main result of this subsection is the following lemma.

Lemma 3.3.

For every $\ell,r\in\mathbb{N}$ , there is a reduction that, given 3-SAT formula $\phi$ with $n$ variables and $m$ clauses, outputs a $U$ -regular Label Cover instance $\mathcal{L}$ such that

•

(Completeness) $\mathsf{OPT}(\phi)=1\implies\mathsf{OPT}(\mathcal{L})=1$ , and

•

(Soundness) $\mathsf{OPT}(\phi)<s\implies\mathsf{OPT}(\mathcal{L})<s^{\Omega(r)}$ ,

where $|U|=\ell^{r},|V|=n^{r},|\Sigma_{U}|=2^{O(mr/\ell)},|\Sigma_{V}|=2^{O(r)},d_{V}\leq m^{r}$ . The running time of this reduction is $m^{O(r)}\cdot|\Sigma_{U}|$ .

In particular, assuming Gap-ETH, for any $\eta>0$ , if we let $r=\Theta(\log(1/\eta))$ so that

[TABLE]

no algorithm can take a Label Cover instance $\mathcal{L}$ and can decide whether $\mathsf{OPT}(\mathcal{L})=1$ or $\mathsf{OPT}(\mathcal{L})<\eta$ in time $|\Sigma_{U}|^{|U|^{O(1/\log(1/\eta))}}$ .

Note that a brute-force algorithm that tries every assignment to $U$ and chooses the best assignment for $V$ for it runs in $O(|\Sigma_{U}|^{|U|})$ times a polynomial. Lemma 3.3 shows that assuming the Gap-ETH, even approximately solving Label Cover requires significant time.

Lemma 3.3 is proved by a series of well-known transformations between Label Cover instances. We start with the following basic hardness result for Label Cover assuming the Gap-ETH, which follows from essentially restating Gap-ETH as a clause-variable game:

Theorem 3.4 (Theorem 4.1 of [5]).

There is a reduction that, given 3-SAT formula $\phi$ with $n$ variables and $m$ clauses, outputs a $U$ -regular Label Cover instance $\mathcal{L}$ such that

•

(Completeness) $\mathsf{OPT}(\phi)=1\implies\mathsf{OPT}(\mathcal{L})=1$ , and

•

(Soundness) $\mathsf{OPT}(\phi)<s^{\prime}\implies\mathsf{OPT}(\mathcal{L})<s=1-(1-s^{\prime})/3$ ,

where $|U|=m,|V|=n,|\Sigma_{U}|=7,|\Sigma_{V}|=2,d_{V}\leq m$ , and $G$ is $U$ -regular with $d_{U}=3$ . In particular, assuming the Gap-ETH, there exist constants $\delta>0$ , $s<1$ such that no algorithm can take a Label Cover instance $\mathcal{L}$ and can decide whether $\mathsf{OPT}(\mathcal{L})=1$ or $\mathsf{OPT}(\mathcal{L})<s$ in $O(2^{\delta|U|})$ time.

Let $\ell$ be a parameter that will be related to $k$ in Max $k$ -Coverage later. We can ensure $\ell$ divides $|U|$ by taking an arbitrary vertex in $U$ and making $\ell\lceil|U|/\ell\rceil-|U|$ copies of it. This does not change any of the properties in Theorem 3.4 except to increase the soundness $s$ by $o_{n}(1)$ ; however, the soundness still remains bounded away from $1$ .

Since we want few vertices on the left, we construct a new Label Cover instance $\mathcal{L}_{1}$ by partitioning $U$ into $\ell$ groups and creating super-vertices for each one. Formally, index the vertices of $u$ as $U=\{u_{i,j}\}_{i\in[\ell],j\in[m/\ell]}$ , and let the $i^{th}$ part be $S_{i}:=\{u_{i,j}\}_{j\in[m/\ell]}$ . The new instance $\mathcal{L}_{1}=((U_{1}\cup V_{1},E_{1}),\Sigma_{U_{1}},\Sigma_{V_{1}},\{\pi^{1}_{e}\}_{e\in E_{1}})$ is constructed as follows.

•

$V_{1}=V$ and $\Sigma_{V_{1}}=\Sigma_{V}$ (the RHS remains unchanged),

•

$U_{1}=\{S_{1},\dots,S_{\ell}\}$ . $\Sigma_{U_{1}}=(\Sigma_{U})^{m/\ell}$ (the LHS has one super-vertex for each group), and

•

for each $e=(u,v)\in E$ such that $u=u_{i,j}$ , add an edge $e^{\prime}=(S_{i},v)$ to $E_{1}$ with the projection $\pi^{1}_{e^{\prime}}(\sigma_{1},\dots,\sigma_{m/\ell}):=\pi_{e}(\sigma_{j})$ where the latter $\pi_{e}$ denotes the projection in $\mathcal{L}$ . (Recall we allow parallel edges with different projections.)

Since the set of possible labelings and the set of edges remain the same except for syntactic changes, the completeness $c$ and the soundness $s$ do not change. The parameters become $|U_{1}|=\ell,|V_{1}|=|V|,|\Sigma_{U_{1}}|=2^{O(m/\ell)},|\Sigma_{V_{1}}|=O(1)$ . It still maintains $U$ -regularity and $d_{V_{1}}=d_{V}\leq m$ .

The final transformation is the powerful parallel repetition step, which shows that the soundness decreases exponentially as we take the natural graph power. Fix $r\in\mathbb{N}$ . The instance $\mathcal{L}_{2}=((U_{2}\cup V_{2},E_{2}),\Sigma_{U_{2}},\Sigma_{V_{2}},\{\pi^{2}_{e}\}_{e\in E_{2}})$ is constructed as follows.

•

$U_{2}=(U_{1})^{r}$ and $\Sigma_{U_{2}}=(\Sigma_{U_{1}})^{r}$ .

•

$V_{2}=(V_{1})^{r}$ and $\Sigma_{V_{2}}=(\Sigma_{V_{1}})^{r}$ .

•

$E_{2}=(E_{1})^{r}$ . For each $e=(e_{i})_{i\in[r]}\in E_{2}$ with $e_{i}=(u_{i},v_{i})\in E_{1}$ and $(\sigma_{1},\dots,\sigma_{r})\in\Sigma_{U_{1}}^{r}$ , $\pi^{2}_{e}(\sigma_{1},\dots,\sigma_{r})=(\pi^{1}_{e_{1}}(\sigma_{1}),\dots,\pi^{1}_{e_{r}}(\sigma_{r}))$ .

The parameters become $|U_{2}|=\ell^{r},|V_{2}|=|V|^{r}=n^{r},|\Sigma_{U_{2}}|=2^{O(mr/\ell)},|\Sigma_{V_{2}}|=2^{O(r)},d_{V_{2}}\leq d_{V_{1}}^{r}\leq m^{r}$ , and $\mathcal{L}_{2}$ maintains $U$ -regularity. The completeness $c$ still remains $1$ , and by the parallel repetition theorem [26], the soundness drops $s=2^{-\Theta(r)}$ , where the constant hiding in the $\Theta$ depends on the original soundness. This proves Lemma 3.3.

3.2 Hardness of Max $k$ -Coverage from Label Cover

Given the “nice” Label Cover instance from Lemma 3.3 we now show how to reduce this to Max $k$ -Coverage. This reduction is standard and closely follows the classical one given by Feige [13], modulo some minor issues arising from it not being $V$ -regular.

Recall that an instance of $\mathcal{I}$ of Max $k$ -Coverage consists of an underlying universe $\mathcal{U}$ , a family $\mathcal{S}$ of subsets, and an integer $k$ . The goal is to find a subfamily $\mathcal{S}^{\prime}\subseteq\mathcal{S}$ with $|\mathcal{S}^{\prime}|=k$ that covers the largest number of elements. For notational simplicity, we prove the hardness of the weighted version of Max $k$ -Coverage where each element $e\in\mathcal{U}$ has weight $w(e)$ and we want to maximize the total weight of the covered elements. Note that weighted instances can be easily converted to unweighted instances by duplicating elements according to their weights. In our reduction, the ratio between the maximum and the minimum weight will be bounded by the number of elements. The proof appears in Section D.

Lemma 3.5 (Reduction #2).

There exist functions $a:{\mathbb{R}}^{+}\to\mathbb{N}$ and $f:{\mathbb{R}}^{+}\to{\mathbb{R}}^{+}$ such that for any $\varepsilon>0$ , there exists a polynomial-time reduction that takes a Label Cover instance $\mathcal{L}=((U\cup V,E),\Sigma_{U},\Sigma_{V},\{\pi_{e}\}_{e\in E})$ that is $U$ -regular and has the maximum $V$ -degree $d_{V}$ , and produces a Max $k$ -Coverage instance $\mathcal{I}=(\mathcal{U},\mathcal{S},k)$ such that

•

(Completeness) $\mathsf{OPT}(\mathcal{L})=1\implies\mathsf{OPT}(\mathcal{I})=w(\mathcal{U})$ .

•

(Soundness) $\mathsf{OPT}(\mathcal{L})<f(\varepsilon)\implies\mathsf{OPT}(\mathcal{I})\leq(1-1/e+\varepsilon)\cdot w(\mathcal{U})$ .

The reduction satisfies $|\mathcal{U}|\leq|V|\cdot|d_{V}|^{a(\varepsilon)}\cdot a(\varepsilon)^{\Sigma_{V}}$ , $|\mathcal{S}|=a(\varepsilon)\cdot|U|\cdot|\Sigma_{U}|$ , and $k=a|U|$ .

We can now finish the proof of Theorem 3.1 based on Lemma 3.3 and Lemma 3.5.

Proof 3.6 (Proof of Theorem 3.1).

Fix $\varepsilon>0$ that determines $a(\varepsilon)$ and $f(\varepsilon)$ in Lemma 3.5. Let $r\in\mathbb{N}$ in Lemma 3.3 so that the soundness $2^{-\Omega(r)}\leq f(\varepsilon)$ .

With $\ell$ still being a free parameter, Lemma 3.3 shows a reduction from an initial 3-SAT instance $\phi$ with $n$ variables and $m=O(n)$ clauses to a Label Cover instance with $|U|=\ell^{r},|V|=n^{r},|\Sigma_{U}|=2^{O(mr/\ell)},|\Sigma_{V}|=2^{O(r)}$ , and $d_{V}\leq m^{r}$ . Lemma 3.5 with this Label Cover instance produces a Max $k$ -Coverage instance with

[TABLE]

An $(1-1/e+\varepsilon)$ -approximation algorithm for Max $k$ -Coverage that runs in time $|\mathcal{S}|^{k^{1/2r}}$ will distinguish whether $\mathsf{OPT}(\phi)=1$ or $\mathsf{OPT}(\phi)<s^{\prime}$ for some $s^{\prime}$ in time

[TABLE]

which will contradict the Gap-ETH for large enough $\ell$ . Observe that $|\mathcal{S}|\gg|\mathcal{U}|$ ; if we set $g(\varepsilon):=1/2r$ , we get the same implication from an algorithm that runs in time $|\mathcal{U}|^{k^{g(\varepsilon)}}$ , which proves the theorem.

Appendix A Omitted Proofs

See 2

Proof A.1.

Since the distances between points in the original metric space do not change, we only need to check triangle inequalities involving fictitious centers. We prove by induction on $i=0,1,\dots,k$ that the distances on $V\cup\{f^{\prime}_{1},\dots,f^{\prime}_{i}\}$ form a metric. The base case $i=0$ holds since $V$ is metric.

For general $i$ , let $f^{\prime}_{i}\in F^{\prime}$ be a fictitious center and $u,v\in V\cup\{f^{\prime}_{1},\dots,f^{\prime}_{i-1}\}$ be arbitrary points. We consider the following two cases.

•

For $d(f^{\prime}_{i},v)\leq d(f^{\prime}_{i},u)+d(u,v)$ ,

[TABLE]

where the inequality follows from the triangle inequality between $f,u,v$ .

•

For $d(u,v)\leq d(u,f^{\prime}_{i})+d(f^{\prime}_{i},v)$ , first note that

[TABLE]

Let $g$ and $h$ be the facilities achieving the minimum in the first and the second minimization respectively. Since $g$ and $h$ are both in $F_{i}$ , $d(g,h)\leq 2R_{i}$ . Therefore,

[TABLE]

Therefore, all triangle inequalities are satisfied and the new distance on $F^{\prime}\cup V$ is a metric.

See 3

Proof A.2.

We have $\small{\mathsf{improv}}(\varnothing)=0$ by definition. To show that $\small{\mathsf{improv}}(S)$ is monotone, consider two subsets $S\subseteq T\subseteq F$ :

[TABLE]

as desired. Finally, to prove that $\small{\mathsf{improv}}(S)$ is submodular, consider subsets $S\subseteq T\subseteq F$ and center $f\in F$ . For each client $j\in C^{\prime}$ , using the identity $x-\min(x,y)=\max(0,x-y)$ for all real numbers $x$ and $j$ , we get

[TABLE]

Therefore,

[TABLE]

proving the desired submodularity.

Appendix B Miscellaneous Proofs

B.1 Polynomial Aspect Ratio

Recall that the aspect ratio of a metric space $(V,d)$ is $\Delta:=\frac{\max_{x,y\in V}d(x,y)}{\max_{x,y\in V}d(x,y)}$ . For the unweighted version of the problems we consider, we can assume that $\Delta$ is polynomially-bounded, due to the following standard result.

Proposition B.1 (folklore).

Given an $\alpha$ -approximation algorithm $A$ for (unweighted) $k$ -Median on instances with polynomially-bounded aspect ratio that runs in time $T$ , we can obtain an $(\alpha+o(1))$ -approximation algorithm $B$ for (unweighted) $k$ -Median on all instances running in time $T+\operatorname{poly}(n)$ .

Proof B.2.

Given an instance $I$ with large aspect ratio, we first compute a estimate $M$ for the optimal $k$ -Median cost on $I$ —say $M/(2n)\leq OPT(I)\leq M$ , by using an approximation algorithm for the $k$ -Center problem that runs in $\operatorname{poly}(n)$ time for general instances. View the metric space $(V,d)$ as a complete edge-weighted graph. For long edges of length more than $2\alpha M$ , reduce their length to $2\alpha M$ , and for short edges of length less than $M/n^{3}$ , increase their lengths to $M/n^{3}$ . Computing all-pairs shortest paths gives a new metric space $(V,d^{\prime})$ , and let $I^{\prime}$ be the corresponding $k$ -Median instance. Use algorithm $A$ on this instance $I^{\prime}$ to get an $\alpha$ -approximate solution $F\subseteq V$ .

We claim $F$ is also an $(\alpha+o(1))$ -approximate solution for the original instance $I$ . Firstly, if $F^{\star}$ is an optimal solution to $I$ , then its cost in $I^{\prime}$ is greater by at most $n(M/n^{3})$ . Indeed, since $OPT(I)\leq M$ , no client would use the long distances which we shortened; the increase in the short distances gives the $n(M/n^{3})$ term. Hence $OPT(I^{\prime})\leq OPT(I)+n(M/n^{3})=OPT(1+o(1))$ . Again since $F$ is an $\alpha$ -approximation for $I^{\prime}$ , and the long edges had reduced length $2\alpha M$ , none of the clients in $I^{\prime}$ will connect to it using the shortened long distances. Hence

[TABLE]

This completes the proof.

B.2 Bipartite vs. Non-Bipartite Instances

The $k$ -Median/ $k$ -Means problems we defined have two different sets: clients $C$ and potential facilities $F$ . If $C$ and ${\mathbb{F}}$ are allowed to be different subsets of $V$ , we call it the bipartite version of the problem. If $C\subseteq{\mathbb{F}}$ , i.e., we can open facilities at any of the client locations (and potentially at other locations too), it is the non-bipartite case. We observe that only a $(1+1/e)$ -factor hardness is known for the non-bipartite case, whereas our algorithm still gives a factor- $(1+2/e)$ approximation for this case.

In fact, for the non-bipartite case, a simple $2(1+\varepsilon)^{2}$ -approximation can be obtained directly using core-sets, using a variant of the arguments of Czumaj and Sohler [11] as follows. Given an non-bipartite instance $I$ , the algorithm does the following.

Find a core-set $(C^{\prime},w^{\prime})$ with $C^{\prime}\subseteq C$ and $|C^{\prime}|=O(\operatorname{poly}(\varepsilon^{-1}k\log n))$ . 2. 2.

Enumerate over all subsets $F\subseteq C^{\prime}$ being $k$ -subsets of $C^{\prime}$ , and output the set $F_{alg}$ with smallest cost $\sum_{j\in C^{\prime}}w_{j}d(j,F)$ .

The runtime of this algorithm is easily seen to be in FPT, so we now show the approximation guarantee. Let $F^{\star}\subseteq{\mathbb{F}}$ be the optimal solution for instance $I$ with cost $\small{\mathsf{cost}}(C,F^{\star})=OPT(I)$ . By the strong core-set property, $\sum_{j\in C^{\prime}}w_{j}d(j,F^{\star})\leq(1+\varepsilon)OPT(I)$ . Now, for each facility $f^{\star}\in F^{\star}$ let $\eta(f^{\star}):=\{j\in C^{\prime}\mid d(j,f^{\star})=\min_{f\in F^{\star}}d(j,f)\}$ be the closest client among those served by $f^{\star}$ . Observe that $F^{\prime}:=\{\eta(f^{\star})\mid f^{\star}\in F^{\star}\}$ satisfies $F^{\prime}\subseteq C^{\prime}\subseteq C\subseteq{\mathbb{F}}$ , has size $|F^{\prime}|\leq k$ , and ensures that

[TABLE]

The factor of $2$ comes from the fact that $d(j,\eta(f^{\star}))\leq d(j,f^{\star})+d(f^{\star},\eta(f^{\star}))\leq 2d(j,f^{\star})$ . Now, since we enumerate over all subsets of $C^{\prime}$ , the cost of the set $F_{alg}$ is no greater than the LHS of (B.5). Again using the core-set property,

[TABLE]

This completes the proof of the $2(1+\varepsilon)^{2}$ approximation.

Observe that this algorithm crucially uses that $C\subseteq{\mathbb{F}}$ , so we can open a facility at the closest client location $\eta(f^{\star})$ . Hence this idea does not extend to the bipartite case where $\eta(f^{\star})$ may not belong to ${\mathbb{F}}$ .

Appendix C Extensions to Related Problems

C.1 The Algorithm for $k$ -Means

The extension to $k$ -Means is immediate. The first change is in the definition of cost: the $k$ -Means cost is $\small{\mathsf{cost}}(C,F):=\sum_{j}w_{j}\,d(j,F)^{2}$ . However, the induced function is still monotone submodular. Now, by the calculations identical to Claim 5, $d(j,F^{\prime})\leq(3+2\varepsilon)d(j,F^{\star})$ for client $j$ ; hence

[TABLE]

Plugging this into (2.4), we immediately get

[TABLE]

The runtime is $O(\varepsilon^{-4}k\log k)^{k}\operatorname{poly}(n)$ , which is the same barring a worse dependence on $\varepsilon$ because of the larger core-set. This proves the result for $k$ -Means.

C.2 The Algorithm for Matroid Median

We follow the algorithm for $k$ -Median, but now we place two matroid constraints: in addition to the partition matroid constraint we add in the matroid constraint coming from the Matroid Median problem itself. Maximizing a monotone submodular function subject to two matroid constraints has a $\frac{1}{2+\varepsilon}$ -approximation [21]. Hence, instead of (2.4), we get

[TABLE]

If the rank of the matroid is $k$ , then any valid base of the matroid is also a $k$ -subset; hence a core-set for $k$ -Median is also a core-set for Matroid Median. This means the rest of the argument remains unchanged.

C.3 Facility Location

In this subsection, we prove Theorem 1.4 for Facility Location. Given an instance $((V,d),C,{\mathbb{F}},\small{\mathsf{open}})$ for Facility Location, let $k$ be the number of facilities opened in the optimal solution. Our parameter will be this value $k$ . Let $\small{\mathsf{cost}}^{\star}$ and $\small{\mathsf{open}}^{\star}$ be the total connection and opening cost of the optimal solution respectively. For sake of simplicity, we assume that $\small{\mathsf{open}}(f)$ is the same for every $f$ , but our idea can be easily generalized when facilities have nonuniform opening costs (by guessing the opening costs of the optimal facilities to within a $(1+\varepsilon)$ -factor). This implies that $\small{\mathsf{open}}^{\star}=k\,\small{\mathsf{open}}(f)$ .

The general structure of the algorithm resembles the algorithm for $k$ -Median. We first construct a core-set that preserves the connection cost of every $F\subseteq{\mathbb{F}}$ with $|F|\leq 2k$ , so that we can assume $|C|=O(\varepsilon^{2}k\log n)$ . The algorithm guesses (a) the leaders $\{\ell_{1},\dots,\ell_{k}\}$ , (b) the distances $R_{1},\dots,R_{k}$ from the leaders to their facilities as in Algorithm 2.1, and then (c) for each $\ell_{i}$ , compute the set of its possible facilities $F_{i}$ . These sets $F_{i}$ give us a partition matroid on the potential facility locations.

Lemma C.1.

Consider a monotone submodular function $f$ , subject to a partition matroid constraint (with rank $k$ ). There exists a polynomial-time algorithm that, given $\gamma\geq 1$ , returns a set $S$ with full rank and size $|S|\leq\gamma k$ , such that for any $F^{\star}$ with $|F^{\star}|\leq k$ , we have

[TABLE]

Proof C.2.

We first use the algorithm from [4] to find a set $S_{1}$ of size $k$ such that $S_{1}$ is a base of the matroid, and

[TABLE]

Let $f_{S_{1}}$ be the residual function defined as $f_{S_{1}}(S):=f(S\cup S_{1})-f(S_{1})$ . Since $f$ is monotone, we get

[TABLE]

Now we choose a set $S_{2}$ by picking $(\gamma-1)k$ more elements that greedily maximize the residual function. The analysis of the greedy algorithm implies that

[TABLE]

so that the total cost is at least

[TABLE]

which completes the proof.

We use the algorithm from Lemma C.1 to pick a set $S^{\star}$ of size $\gamma k$ , instead of size $k$ as in Algorithm 2.1. The opening cost of this solution is $\gamma\small{\mathsf{open}}^{\star}$ , since each facility costs $\small{\mathsf{open}}^{\star}/k$ . Moreover, arguing as in Lemma 2.7 (but using Lemma C.1 instead of the $(1-1/e)$ -approximation guaranteed by algorithm from [4]), the connection cost is $(1+2/e^{\gamma})\;\small{\mathsf{cost}}^{\star}$ . Several cases arise:

•

If $\small{\mathsf{open}}^{\star}\geq 2e^{-1}\small{\mathsf{cost}}^{\star}$ : Trying $\gamma=1$ gives an approximation ratio at most $\frac{1+4/e}{1+2/e}\leq 1.424$ .

•

If $\varepsilon^{2}\small{\mathsf{cost}}^{\star}\leq\small{\mathsf{open}}^{\star}<2e^{-1}\small{\mathsf{cost}}^{\star}$ : Trying $\gamma=\ln(2/(\small{\mathsf{open}}^{\star}/\small{\mathsf{cost}}^{\star}))$ gives an approximation ratio $\frac{1+2\gamma/e^{\gamma}+2/e^{\gamma}}{1+2/e^{\gamma}}$ . Recalling ${\alpha_{\mathsf{FL}}}:=\max_{x\geq 0}\big{(}1+\frac{x}{1+x}\ln\frac{2}{x}\big{)}$ , by setting $x=2/e^{\gamma}$ , we can see that it is upper bounded by exactly ${\alpha_{\mathsf{FL}}}\approx 1.463$ .

•

If $\varepsilon^{2}\small{\mathsf{cost}}^{\star}>\small{\mathsf{open}}^{\star}$ : Trying $\gamma=1/\varepsilon$ gives the total cost $(1/\varepsilon)\small{\mathsf{open}}^{\star}+(1+2/e^{1/\varepsilon})\small{\mathsf{cost}}^{\star}\leq(1+3\varepsilon)\small{\mathsf{cost}}^{\star}$ .

Trying every value of $\gamma\in[1,1/\varepsilon]$ that makes $\gamma k$ an integer will achieve an approximation ratio of ${\alpha_{\mathsf{FL}}}\approx 1.463$ .

Appendix D Reduction from Label Cover to Max $k$ -Coverage

In this section, we give a reduction from Label Cover to Max $k$ -Coverage, proving Lemma 3.5.

Lemma D.1 (Restatement of Lemma 3.5).

There exist functions $a:{\mathbb{R}}^{+}\to\mathbb{N}$ and $f:{\mathbb{R}}^{+}\to{\mathbb{R}}^{+}$ such that for any $\varepsilon>0$ , there exists a polynomial-time reduction that takes a Label Cover instance $\mathcal{L}=((U\cup V,E),\Sigma_{U},\Sigma_{V},\{\pi_{e}\}_{e\in E})$ that is $U$ -regular and has the maximum $V$ -degree $d_{V}$ , and produces a Max $k$ -Coverage instance $\mathcal{I}=(\mathcal{U},\mathcal{S},k)$ such that

•

(Completeness) $\mathsf{OPT}(\mathcal{L})=1\implies\mathsf{OPT}(\mathcal{I})=w(\mathcal{U})$ .

•

(Soundness) $\mathsf{OPT}(\mathcal{L})<f(\varepsilon)\implies\mathsf{OPT}(\mathcal{I})\leq(1-1/e+\varepsilon)\cdot w(\mathcal{U})$ .

The reduction satisfies $|\mathcal{U}|\leq|V|\cdot|d_{V}|^{a(\varepsilon)}\cdot a(\varepsilon)^{\Sigma_{V}}$ , $|\mathcal{S}|=a(\varepsilon)\cdot|U|\cdot|\Sigma_{U}|$ , and $k=a|U|$ .

Proof D.2.

The high-level idea of the proof is the following: we choose some value $a=a(\varepsilon)$ . Now the set of elements consists of many disjoint hypergrids with sides of size $a$ and with $|\Sigma_{V}|$ dimensions. Indeed, there is a copy of the hypergrid $[a]^{|\Sigma_{V}|}$ associated with each $(v,i)$ pair for $v\in V,i\in[d_{v}]^{a}$ —one for each right vertex and an $a$ -subset of its left neighbors.

Now the sets: they are associated with each $u\in U$ and $j\in[a]$ and each potential label $\ell\in\Sigma_{U}$ . The sets associated with a pair $(j,u)$ have a nonempty intersection with the hypergrid of $(v,i)$ if and only if $u$ is the $i_{j}$ ’th neighbor of $v$ . Indeed, the set for $(j,u,\ell)$ contains the entire $j^{th}$ “slice” of each of these hypergrids, along the $\ell^{th}$ dimension. The idea is very clean: if there is a “good” labeling for $\mathcal{L}$ , then all these slices will be chosen in a coordinated way along the same dimension, and we will cover all the hypergrids completely. If there are no good labelings for $\mathcal{L}$ , then these slices will be chosen in an uncoordinated way along different dimensions, and then we will end up covering only a constant factor of the hypergrids. (As intuition, if $a=2$ and we did not manage to pick two slices of the hypercube along the same dimension, we cover only $\nicefrac{{3}}{{4}}$ of the cube: the hypergrids allow us to get $1-1/e$ .)

For those familiar with the exposition from [13], we are considering an $a$ -prover system where the verifier first randomly chooses a variable question $v\in V$ and each of $a$ provers gets a clause question independently sampled from $v$ ’s neighbors.

Formal construction.* For each $v\in V$ , fix an arbitrary ordering of its incident edges so that the $d_{v}$ edges incident on $v$ are represented as $e_{v,1}=(\alpha_{v,1},v),\dots,e_{v,d_{v}}=(\alpha_{v,d_{v}},v)$ . Let $a=a(\varepsilon)$ be an integer that will be fixed later, and consider the hypergrid $[a]^{|\Sigma_{V}|}$ . Let $C_{j,\ell}:=\{(x_{1},\dots,x_{|\Sigma_{V}|}:x_{\ell}=j\}$ be the $j^{th}$ slice in the $\ell^{th}$ coordinate. We can describe our set system as follows.*

[TABLE]

Completeness.* Suppose the labeling $\sigma:(U\cup V)\to(\Sigma_{U}\cup\Sigma_{V})$ satisfies every edge of $\mathcal{L}$ , then the $k=a|U|$ subsets*

[TABLE]

covers every element in $U$ ; indeed, the element $(v,i,x)$ is covered by the set $S(x_{\sigma(v)},\alpha_{v,i_{x_{\sigma(v)}}},\sigma(\alpha_{v,i_{x_{\sigma(v)}}}))$ . This proves the first claim of the theorem that we have perfect completeness.

Soundness.* For sake of a contradiction assume there exists a subfamily $\mathcal{S}^{\prime}\subseteq\mathcal{S}$ such that $|\mathcal{S}^{\prime}|=k=a|U|$ and $\mathcal{S}^{\prime}$ covers elements of total weight at least $(1-1/e+\varepsilon)$ . Recall that each hypergrid is indexed by a $(v,i)$ . To simplify notation, we identify a pair $(v,i)$ and its hypergrid. We also define $(v,i)$ ’s weight $w(v,i):=1/((d_{v})^{a-1}|E|)$ , which is the weight of each of the elements in that hypergrid. The sum of all hypergrids’ weights is $\sum_{v\in V}(d_{v})^{a}w(v,i)=\sum_{v}d_{v}/|E|=1$ , and let $\mathcal{D}$ be the distribution of $(v,i)$ ’s according to their weights. For the rest of this section, an “average hypergrid” refers to a random $(v,i)$ sampled from $\mathcal{D}$ , possibly conditioned on $(v,i)\subseteq X$ for some subset $X$ .*

Recall that each set $S\in\mathcal{S}$ intersects with one hypergrid in exactly one slice $C_{j,\ell}$ or is disjoint from it. Since the Label Cover instance $\mathcal{L}$ is $U$ -regular, the sum of the weights of the hypergrids that intersect $S(j,u,\ell)\in\mathcal{S}^{\prime}$ is

[TABLE]

which means that each set intersects with the same weighted number of hypergrids. Let $t_{v,i}$ be the number of sets in $\mathcal{S}^{\prime}$ that intersect with the hypergrid $(v,i)$ . By double counting,

[TABLE]

So each hypergrid intersects with $a$ sets from $\mathcal{S}^{\prime}$ in average.

Call a hypergrid $(v,i)$ big when $t_{v,i}>3a/\varepsilon$ , and call $(v,i)$ good if it is not big and there exist $j<j^{\prime}\in[a]$ and $\ell_{j},\ell_{j^{\prime}}\in\Sigma_{U}$ such that $\pi_{e_{v,i_{j}}}(\ell_{j})=\pi_{e_{v,i_{j^{\prime}}}}(\ell_{j^{\prime}})$ and both $S(j,u_{v,i_{j}},\ell_{j})$ and $S(\ell,u_{v,i_{j^{\prime}}},\ell_{j^{\prime}})$ are in $\mathcal{S}^{\prime}$ . In other words, hypergrid $(v,i)$ is intersected in at least two different slices in the same coordinate. Call the remaining $(v,i)$ ’s pseudorandom.

Since the average of $t_{v,i}=a$ , the total weight of big $(v,i)$ ’s is at most $\varepsilon/3$ . Hence elements of total weight at least $(1-1/e+2\varepsilon/3)\cdot w(\mathcal{U})$ must be covered in the good or pseudorandom hypergrids. The average value of $t_{v,i}$ for good and pseudorandom $(v,i)$ hypergrids is still at most $a$ .

We claim that the total weight of good $(v,i)$ ’s is at least $\varepsilon/3$ . Suppose not. Then elements of total weight at least $(1-1/e+\varepsilon/3)\cdot w(\mathcal{U})$ are covered in the pseudorandom hypergrids. Note that the average value of $t_{v,i}$ for the pseudorandom pairs is at most $(1+\varepsilon/3)a$ . For each of those hypergrids, since it is not good, the fraction of points covered by $a^{\prime}$ slices is exactly $(1-(1-1/a)^{a^{\prime}})$ , which is monotone and concave in $a^{\prime}$ . Therefore, the fraction of points covered in the pseudorandom cubes is at most $(1-(1-1/a)^{(1+\varepsilon/3)a})$ . Fix $a=a(\varepsilon)$ large enough so that this quantity becomes less than $(1-1/e+\varepsilon/3)$ , leading to the desired contradiction. Therefore, the total weight of good $(v,i)$ ’s is at least $\varepsilon/3$ .

For $j\in[a]$ and $u\in U$ , let $\Sigma(j,u):=\{\ell\in\Sigma_{U}\mid(j,u,\ell)\in\mathcal{S}^{\prime}\}$ be the labels that correspond to $(j,u)$ . We now construct a random labeling $\sigma$ for $\mathcal{L}$ as follows.

•

Randomly sample $j<j^{\prime}\in[a]$ uniformly from among $\binom{a}{2}$ unordered pairs.

•

For $u\in U$ , let $\sigma(u)$ be a random label from $\Sigma(j,u)$ chosen uniformly and independently (choose an arbitrary label if $\Sigma(j,u)=\varnothing$ ).

•

For $v\in V$ , uniformly sample $i\in[d_{v}]^{a}$ , and let $u:=\alpha_{v,i_{j^{\prime}}}$ . Let $\ell$ be a random label from $\Sigma(j^{\prime},u)$ chosen uniformly and independently. Let $\sigma(v)=\pi_{u,v}(\ell)$ . (Choose an arbitrary label if $\Sigma(j^{\prime},u)=\varnothing$ ).

Fix a good pair $(v,i)$ . Given that $i$ is sampled in the above randomized strategy, with probability at least $1/\binom{a}{2}$ , $j<j^{\prime}$ are sampled such that $\pi_{e_{v,i_{j}}}(\Sigma_{j,\alpha_{v,i_{j}}})\cap\pi_{e_{v,i_{j^{\prime}}}}(\Sigma_{j^{\prime},\alpha_{v,i_{j^{\prime}}}})\neq\varnothing$ , so $\pi_{e_{v,i_{j}}}(\sigma(\alpha_{v,i_{j}}))=\sigma(v)$ with probability at least $(\binom{a}{2}\cdot(3a/\varepsilon)^{2})^{-1}\geq\varepsilon^{2}/5a^{3}$ .

Fix a vertex $v\in V$ and let $q_{v}$ be the fraction of $i\in[d_{v}]^{a}$ such that $(v,i)$ is good. The expected fraction of the edges incident on $v$ satisfied by the above randomized labeling $\sigma$ is

[TABLE]

where the first equality follows from the fact that for fixed $j,j^{\prime}$ , over the randomness of $i$ , $\alpha_{v,i_{j}}$ and $\alpha_{v,i_{j^{\prime}}}$ are sampled uniformly and independently over the neighbors of $v$ , so that $u$ in the first line can be replaced by $\alpha_{v,i_{j}}$ in the second line.

Let $\mathcal{D}_{V}$ be the distribution over $v\in V$ , which is obtained as the marginal distribution of $v$ in $\mathcal{D}$ . This implies that in $\mathcal{D}_{V}$ , $v$ is sampled with probability $d_{v}/|E|$ , and

[TABLE]

Therefore, the total fraction of Label Cover edges satisfied by the above randomized strategy is at least

[TABLE]

Let $f(\varepsilon):=(\varepsilon/a)^{3}/15$ . This choice establishes that $\mathsf{OPT}(\mathcal{I})>(1-1/e+\varepsilon)\,w(\mathcal{U})\implies\mathsf{OPT}(\mathcal{L})\geq f(\varepsilon)$ , finishing the proof of the soundness claim.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k 𝑘 k -means and Euclidean k 𝑘 k -median by primal-dual algorithms. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017 , pages 61–72, 2017. URL: https://doi.org/10.1109/FOCS.2017.15 , doi:10.1109/FOCS.2017.15 . · doi ↗
2[2] Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of euclidean k-means. In 31st International Symposium on Computational Geometry, So CG 2015, June 22-25, 2015, Eindhoven, The Netherlands , pages 754–767, 2015. URL: https://doi.org/10.4230/LIP Ics.SOCG.2015.754 , doi:10.4230/LIP Ics.SOCG.2015.754 . · doi ↗
3[3] Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k 𝑘 k -median, and positive correlation in budgeted optimization. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms , pages 737–756. SIAM, 2014.
4[4] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. , 40(6):1740–1766, 2011. URL: https://doi.org/10.1137/080733991 , doi:10.1137/080733991 . · doi ↗
5[5] Parinya Chalermsook, Marek Cygan, Guy Kortsarz, Bundit Laekhanukit, Pasin Manurangsi, Danupon Nanongkai, and Luca Trevisan. From gap-ETH to FPT-inapproximability: Clique, dominating set, and more. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on , pages 743–754. IEEE, 2017.
6[6] Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k 𝑘 k -median problem. J. Comput. Syst. Sci. , 65(1):129–149, 2002. URL: https://doi.org/10.1006/jcss.2002.1882 , doi:10.1006/jcss.2002.1882 . · doi ↗
7[7] Ke Chen. On k 𝑘 k -median clustering in high dimensions. In SODA , 2006.
8[8] Ke Chen. On coresets for k 𝑘 k -median and k 𝑘 k -means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing , 39(3):923–947, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Acknowledgements.

Tight FPT Approximations for kkk-Median and

Abstract

keywords:

1 Introduction

Theorem 1.1** (Algorithm for kkk-Median/kkk-Means).**

Theorem 1.2** (Hardness).**

Theorem 1.3** (Algorithm for Matroid Median).**

Theorem 1.4** (Algorithm for Facility Location).**

1.1 Our Techniques

1.2 Related Work

1.3 Preliminaries

2 The Approximation Algorithm

2.1 The Intuition

2.2 Client Reduction via Coresets

Definition 2.1** (Core-set).**

Theorem 2.2** ([14], Theorem 15.4).**

Fact 1**.**

2.3 Reduction to Submodular Maximization

Theorem 2.3**.**

Claim 2** (Metricity).**

Claim 3** (Submodularity).**

2.3.1 Approximation Ratio

Definition 2.4** (Leader).**

Claim 4**.**

Proof 2.5**.**

Claim 5**.**

Proof 2.6**.**

Lemma 2.7**.**

Proof 2.8**.**

2.3.2 Putting it all together

3 Gap-ETH Hardness of Max kkk-Coverage

Theorem 3.1** (Hardness for Max-Coverage).**

3.1 Hardness of Label Cover from Gap-ETH

Definition 3.2** (Label Cover).**

Lemma 3.3**.**

Theorem 3.4** (Theorem 4.1 of [5]).**

3.2 Hardness of Max kkk-Coverage from Label Cover

Lemma 3.5** (Reduction #2).**

Proof 3.6** (Proof of Theorem 3.1).**

Appendix A Omitted Proofs

Proof A.1**.**

Proof A.2**.**

Appendix B Miscellaneous Proofs

B.1 Polynomial Aspect Ratio

Proposition B.1** (folklore).**

Proof B.2**.**

B.2 Bipartite vs. Non-Bipartite Instances

Appendix C Extensions to Related Problems

C.1 The Algorithm for kkk-Means

C.2 The Algorithm for Matroid Median

C.3 Facility Location

Lemma C.1**.**

Proof C.2**.**

Appendix D Reduction from Label Cover to Max kkk-Coverage

Lemma D.1** (Restatement of Lemma 3.5).**

Proof D.2**.**

Tight FPT Approximations for $k$ -Median and

Theorem 1.1 (Algorithm for $k$ -Median/ $k$ -Means).

Theorem 1.2 (Hardness).

Theorem 1.3 (Algorithm for Matroid Median).

Theorem 1.4 (Algorithm for Facility Location).

Definition 2.1 (Core-set).

Theorem 2.2 ([14], Theorem 15.4).

Fact 1.

Theorem 2.3.

Claim 2 (Metricity).

Claim 3 (Submodularity).

Definition 2.4 (Leader).

Claim 4.

Proof 2.5.

Claim 5.

Proof 2.6.

Lemma 2.7.

Proof 2.8.

3 Gap-ETH Hardness of Max $k$ -Coverage

Theorem 3.1 (Hardness for Max-Coverage).

Definition 3.2 (Label Cover).

Lemma 3.3.

Theorem 3.4 (Theorem 4.1 of [5]).

3.2 Hardness of Max $k$ -Coverage from Label Cover

Lemma 3.5 (Reduction #2).

Proof 3.6 (Proof of Theorem 3.1).

Proof A.1.

Proof A.2.

Proposition B.1 (folklore).

Proof B.2.

C.1 The Algorithm for $k$ -Means

Lemma C.1.

Proof C.2.

Appendix D Reduction from Label Cover to Max $k$ -Coverage

Lemma D.1 (Restatement of Lemma 3.5).

Proof D.2.