Hitting Sets when the Shallow Cell Complexity is Small

Sander Aarts; David B. Shmoys

arXiv:2302.11637·cs.CG·September 26, 2023

Hitting Sets when the Shallow Cell Complexity is Small

Sander Aarts, David B. Shmoys

PDF

Open Access

TL;DR

This paper presents a simpler algorithm for the hitting set problem in geometric set systems with small shallow cell complexity, achieving improved approximation ratios by combining linear programming relaxation with probabilistic sampling.

Contribution

It introduces a straightforward algorithm that generalizes the net-finder approach, utilizing weighted epsilon-nets and a generalized packing lemma for better approximation.

Findings

01

Achieves improved asymptotic approximation ratios.

02

Simplifies the algorithmic approach compared to existing methods.

03

Extends packing lemmas to weighted epsilon-nets.

Abstract

The hitting set problem is a well-known NP-hard optimization problem in which, given a set of elements and a collection of subsets, the goal is to find the smallest selection of elements, such that each subset contains at least one element in the selection. Many geometric set systems enjoy improved approximation ratios, which have recently been shown to be tight with respect to the shallow cell complexity of the set system. The algorithms that exploit the cell complexity, however, tend to be involved and computationally intensive. This paper shows that a slightly improved asymptotic approximation ratio for the hitting set problem can be attained using a much simpler algorithm: solve the linear programming relaxation, take one initial random sample from the set of elements with probabilities proportional to the LP-solution, and, while there is an unhit set, take an additional sample from…

Equations114

y min j : x_{j} \in X \sum

y min j : x_{j} \in X \sum

s.t. j : x_{j} \in X \sum

y_{j} \in {0, 1},

\forall R \in R with μ (R) \geq ϵ \cdot μ (X) : R \cap H \neq = \emptyset,

\forall R \in R with μ (R) \geq ϵ \cdot μ (X) : R \cap H \neq = \emptyset,

\displaystyle\max_{\epsilon,\mu}\

\displaystyle\max_{\epsilon,\mu}\

s.t. j : x_{j} \in X \sum a_{ij}

j : x_{j} \in X \sum

μ_{j} \geq 0,

H \leftarrow pick each x \in X with probability min {1, \frac{2 μ ( x )}{ϵ} \cdot max {lo g \frac{1}{γ}, d lo g \frac{1}{ϵ}}}

H \leftarrow pick each x \in X with probability min {1, \frac{2 μ ( x )}{ϵ} \cdot max {lo g \frac{1}{γ}, d lo g \frac{1}{ϵ}}}

O (z^{*} \cdot max {1, lo g φ (O (z^{*}), O (d))}) .

O (z^{*} \cdot max {1, lo g φ (O (z^{*}), O (d))}) .

μ (Δ (S, R)) = μ ((S \ R) \cup (R \ S)) \geq δ,

μ (Δ (S, R)) = μ ((S \ R) \cup (R \ S)) \geq δ,

card (P) \leq \frac{24 d}{δ} \cdot φ (\frac{8 d}{δ}, \frac{48 d k}{δ}) .

card (P) \leq \frac{24 d}{δ} \cdot φ (\frac{8 d}{δ}, \frac{48 d k}{δ}) .

card (P) \leq 2 E [card (P ∣_{Y})],

card (P) \leq 2 E [card (P ∣_{Y})],

P_{L} = {R \in P : M (R, U) \geq 6 \cdot \frac{8 d k}{δ}} .

P_{L} = {R \in P : M (R, U) \geq 6 \cdot \frac{8 d k}{δ}} .

P [R \in P_{L}] = P [M (R, U) \geq 6 \cdot \frac{8 d k}{δ}] .

P [R \in P_{L}] = P [M (R, U) \geq 6 \cdot \frac{8 d k}{δ}] .

E [M (R, U)] = k = 1 \sum s P [u_{k} \in R] = k = 1 \sum s μ (R) \leq s \cdot k \leq \frac{8 d k}{δ},

E [M (R, U)] = k = 1 \sum s P [u_{k} \in R] = k = 1 \sum s μ (R) \leq s \cdot k \leq \frac{8 d k}{δ},

P [R \in P_{L}]

P [R \in P_{L}]

\displaystyle\leq\mathbb{P}\Big{[}M(R,U)\geq 6\cdot\mathbb{E}[M(R,U)]\Big{]}\leq 1/6.

E [card (P ∣_{Y})]

E [card (P ∣_{Y})]

\leq R \in P \sum P [R \in P_{L}] + card (Y) \cdot φ (card (Y), 6 \cdot \frac{8 d k}{δ})

\leq \frac{1}{6} card (P) + \frac{8 d}{δ} \cdot φ (\frac{8 d}{δ}, \frac{48 d k}{δ}),

p \leq \frac{24 d}{β ϵ} \cdot φ (\frac{8 d}{β ϵ}, \frac{96 d}{β}) .

p \leq \frac{24 d}{β ϵ} \cdot φ (\frac{8 d}{β ϵ}, \frac{96 d}{β}) .

S^{j} = (R_{1}^{j}, \dots, R_{n_{j}}^{j}) .

S^{j} = (R_{1}^{j}, \dots, R_{n_{j}}^{j}) .

μ (P^{j} \cap R_{i}^{j}) > \frac{μ ( P ^{j} ) + μ ( R _{i}^{j} ) - β ϵ}{2} .

μ (P^{j} \cap R_{i}^{j}) > \frac{μ ( P ^{j} ) + μ ( R _{i}^{j} ) - β ϵ}{2} .

μ (P^{j}) + μ (R_{i}^{j})

μ (P^{j}) + μ (R_{i}^{j})

= μ (Δ (P^{j}, R_{i}^{j})) + 2 μ (P^{j} \cap R_{i}^{j}) < β ϵ + 2 μ (P^{j} \cap R_{i}^{j}) .

S^{' j} = (R \in S^{j} : H_{R} is a γ -net for (R, R ∣_{R})) .

S^{' j} = (R \in S^{j} : H_{R} is a γ -net for (R, R ∣_{R})) .

len (S^{' j}) \leq {\frac{24 d}{3/2 - β - γ} \cdot φ (\frac{8 d}{3/2 - β - γ}, \frac{48 d}{3/2 - β - γ}), O (1), if β + γ \geq 1/2; otherwise.

len (S^{' j}) \leq {\frac{24 d}{3/2 - β - γ} \cdot φ (\frac{8 d}{3/2 - β - γ}, \frac{48 d}{3/2 - β - γ}), O (1), if β + γ \geq 1/2; otherwise.

T^{' j} = (S_{1}^{j}, \dots, S_{n_{j}^{'}}^{j}) with S_{i}^{j} = R_{i}^{j} \cap P^{j} for each i \in {1, \dots, n_{i}^{'}} .

T^{' j} = (S_{1}^{j}, \dots, S_{n_{j}^{'}}^{j}) with S_{i}^{j} = R_{i}^{j} \cap P^{j} for each i \in {1, \dots, n_{i}^{'}} .

μ (R_{k}^{j} \cap R_{l}^{j}) < γ \cdot μ (R_{k}^{j}) .

μ (R_{k}^{j} \cap R_{l}^{j}) < γ \cdot μ (R_{k}^{j}) .

μ (S_{k}^{j} \cap S_{l}^{j}) = μ (R_{k}^{j} \cap R_{l}^{j} \cap P^{j}) \leq μ (R_{k}^{j} \cap R_{l}^{j}) < γ \cdot μ (R_{k}^{j}) .

μ (S_{k}^{j} \cap S_{l}^{j}) = μ (R_{k}^{j} \cap R_{l}^{j} \cap P^{j}) \leq μ (R_{k}^{j} \cap R_{l}^{j}) < γ \cdot μ (R_{k}^{j}) .

μ (Δ (S_{k}^{j}, S_{l}^{j}))

μ (Δ (S_{k}^{j}, S_{l}^{j}))

= μ (R_{k}^{j} \cap P^{j}) + μ (R_{l}^{j} \cap P^{j}) - 2 \cdot μ (S_{k}^{j} \cap S_{l}^{j})

> \frac{μ ( P ^{j} ) + μ ( R _{k}^{j} ) - β ϵ}{2} + \frac{μ ( P ^{j} ) + μ ( R _{l}^{j} ) - β ϵ}{2} - 2 \cdot μ (S_{k}^{j} \cap S_{l}^{j})

> \frac{μ ( P ^{j} ) + μ ( R _{k}^{j} ) - β ϵ}{2} + \frac{μ ( P ^{j} ) + μ ( R _{l}^{j} ) - β ϵ}{2} - 2 γ \cdot μ (R_{k}^{j})

= μ (P^{j}) - β ϵ + \frac{1}{2} μ (R_{l}^{j}) + (1/2 - 2 γ) μ (R_{k}^{j})

\geq (3/2 - β - γ) \cdot μ (P^{j}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Packing Problems · Optimization and Search Problems · Computational Geometry and Mesh Generation

Full text

11institutetext: Cornell University, Ithaca NY 14580, USA

11email: [email protected], [email protected]

Hitting Sets when the Shallow Cell Complexity is Small

††thanks: This material is based on work supported by the NSF under Grant CNS-1952063.

Sander Aarts 11 0000-0003-1852-9116

David B. Shmoys 11 0000-0003-3882-901X

Abstract

The hitting set problem is a well-known NP-hard optimization problem in which, given a set of elements and a collection of subsets, the goal is to find the smallest selection of elements, such that each subset contains at least one element in the selection. Many geometric set systems enjoy improved approximation ratios, which have recently been shown to be tight with respect to the shallow cell complexity of the set system. The algorithms that exploit the cell complexity, however, tend to be involved and computationally intensive. This paper shows that a slightly improved asymptotic approximation ratio for the hitting set problem can be attained using a much simpler algorithm: solve the linear programming relaxation, take one initial random sample from the set of elements with probabilities proportional to the LP-solution, and, while there is an unhit set, take an additional sample from it proportional to the LP-solution. Our algorithm is a simple generalization of the elegant net-finder algorithm by Nabil Mustafa. To analyze this algorithm for the hitting set problem, we generalize the classic Packing Lemma, and the more recent Shallow Packing Lemma, to the setting of weighted epsilon-nets.

Keywords:

Hitting set Set cover Approximation algorithms Computational geometry Shallow cell complexity Wireless coverage

1 Introduction

The input to the hitting set problem is a finite set system – a ground set $X$ of $m$ elements, or points, and a collection $\mathcal{R}$ of $n$ subsets, or ranges, of $X$ . This can also be understood as a hypergraph, with vertices $X$ and hyper-edges $\mathcal{R}$ . A hitting set is a subset of elements $H\subseteq X$ such that every set $R\in\mathcal{R}$ is hit by $H$ , i.e. $R\cap H\neq\emptyset$ , for all $R\in\mathcal{R}$ . This is a vertex cover under the hypergraph view. The set system can be encoded as a set-element incidence matrix $A\in\{0,1\}^{n\times m}$ , in which the $(i,j)$ th entry $a_{ij}$ is $1$ if range $R_{i}$ contains point $x_{j}$ , and [math] otherwise. The IP of the minimum hitting set problem is

[TABLE]

where variable $y_{j}\in\{0,1\}$ indicates whether element $x_{j}$ is in the solution $H$ .

Hitting sets and set covers are intimately connected; a hitting set for $A$ is a set cover of $A^{T}$ . Both problems’ decision versions are NP-complete [10]. There exists an $\mathcal{O}\left(\log m\right)$ -approximation algorithm, and this bound is tight unless P = NP [9, 13]. However, there are algorithms that exploit additional structure in $A$ to attain improved approximation ratios111For example when $A$ has bounded row or column sums [2, 6].. Indeed, our work is motivated by the problem of exploiting structure when covering large numbers of wireless LoRaWAN transmitters with wireless receivers. Transmitters can be viewed as points, which are considered to be covered if they are in the line of sight of a wireless receiver, which in turn drives transmission quality in LoRaWAN [22]. The area in the line of sight of a receiver roughly resembles a simple shape.

Many geometric set systems enjoy better approximation ratios via epsilon-nets, or $\epsilon$ -nets. A set system is said to be geometric whenever its elements can be encoded as points in Euclidean space, and sets are derived from containment of the points in geometric shapes, such as half-spaces, balls or rectangles222Some definition allow for uncountably many geometric shapes in $\mathcal{R}$ , e.g. all squares. However, because the number of points $X$ is finite, there are nevertheless a finite number of unique sets induced by these shapes.. The seminal work of Brönnimann and Goodrich [3], and Even et al. [8], connects the approximability of a hitting set instance to the size of weighted $\epsilon$ -nets. Given non-negative weights on the points, $\mu:X\rightarrow\mathbb{R}_{\geq 0}$ , a weighted $\epsilon$ -net with respect to weights $\mu$ is a subset $H\subseteq X$ that hits all $\epsilon$ -heavy sets:

[TABLE]

where the weight of any subset $S\subseteq X$ is defined as $\mu(S)=\sum_{x\in S}\mu(x)$ . Even et al. [8] reduce the problem of finding a small hitting set to finding a small $\epsilon$ -net via a reformulation of the linear programming relaxation of the hitting set problem (1). The reformulated LP (1) is a program for finding the largest $\epsilon$ , and corresponding weights $\mu$ , subject to the constraint that an $\epsilon$ -net with respect to weights $\mu$ is a hitting set.

[TABLE]

The first constraint requires that each set $R$ is $\epsilon$ -heavy; the second constraint normalizes the weights. Let $(\epsilon^{*},\mu^{*})$ denote an optimal solution to LP (1), with $\mu^{*}=(\mu^{*}_{1},\dots,\mu^{*}_{n})$ . Let $z^{*}$ be the optimal value to the LP relaxation of the original program (1). The first constraint ensures that an $\epsilon^{*}$ -net with respect to weights $\mu^{*}$ is a hitting set. Moreover, the reciprocal optimal value $1/\epsilon^{*}$ is equal to the optimal LP value $z^{*}$ [8]. In particular, an $\epsilon^{*}$ -net of size $g(1/\epsilon^{*})$ for some function $g(\cdot)$ is a hitting set of size of $g(z^{*})$ . Hence, to find a small hitting set it suffices to solve LP (1) and find a small $\epsilon^{*}$ -net with respect to weights $\mu^{*}$ .

Haussler and Welzl [12] show that set systems with bounded VC-dimension admit small $\epsilon$ -nets, and develop a simple algorithm to find them. The VC-dimension is a measure of the set system’s complexity. Given a subset $S\subseteq X$ , the projection of $\mathcal{R}$ to $S$ is the set system formed by elements $S$ and sets ${\mathcal{R}\rvert}_{S}=\{R\cap S:R\in\mathcal{R}\}$ . The VC-dimension of $\mathcal{R}$ is the size of the largest subset $S\subseteq X$ such that ${\mathcal{R}\rvert}_{S}$ shatters $S$ , i.e. the largest set $S$ such that ${\mathcal{R}\rvert}_{S}$ contains all subsets of $S$ . In particular, Clarkson [7], and Haussler and Welzl [12], show that any set system with VC-dimension $d$ has a weighted $\epsilon$ -net of size $\mathcal{O}\left(\tfrac{d}{\epsilon}\log\tfrac{1}{\epsilon}\right)$ . This is remarkable, as the size is independent of both the size of $X$ and $\mathcal{R}$ . Moreover, the algorithm for finding such an $\epsilon$ -net is simple: Select a subset $H\subseteq X$ by sampling each element $x$ in $X$ independently.

Theorem 1.1 ( $\epsilon$ -net Theorem [12, 14])

Let $(X,\mathcal{R})$ be a set system with VC-dimension $d$ , and let $\mu:X\rightarrow\mathbb{R}_{\geq 0}$ be element weights with $\mu(X)=1$ . Then for any $\epsilon,\gamma\in(0,1)$ :

[TABLE]

is a weighted $\epsilon$ -net with respect to weights $\mu$ with probability at least $1-\gamma$ .

Throughout, we define $\mu(S)=\sum_{x\in S}\mu(x)$ for all subsets $S\subseteq X$ . For general set systems of VC-dimension $d$ , this bound is tight in expectation [14]. However, there are alternative ways to parameterize the complexity of set systems.

1.1 Shallow Cell Complexity

The shallow cell complexity (SCC) is a finer parameterization of the complexity of set systems. [1, 4, 21]. Readers are referred to Mustafa and Varadarajan [20] for more background. A cell in a binary matrix $A$ is a collection of identical rows. A cell has depth $k$ if the number of $1$ ’s in any of its rows is exactly $k$ , i.e., if each set in the cell contains $k$ elements. For a non-decreasing function $\varphi\left(\cdot,\cdot\right)$ we say binary matrix $A$ has shallow cell complexity (SCC) $\varphi\left(\cdot,\cdot\right)$ if, for all $1\leq k\leq l\leq m$ , the number of cells of depth at most $k$ in any submatrix $A^{*}$ of $A$ of at most $l$ columns, is at most $\varphi\left(l,k\right)$ . A set system $(X,\mathcal{R})$ is said to have SCC $\varphi\left(l,k\right)$ if its set-element incidence matrix $A$ does. Often $\varphi\left(l,k\right)=\mathcal{O}\left(\varphi\left(l\right)k^{c}\right)$ for some constant $c>0$ and single-variable function $\varphi\left(\cdot\right)$ , in which case the dependence on $k$ is can be dropped and the SCC denoted by $\varphi\left(l\right)$ . Examples of geometric set systems with small shallow cell complexity are discs in the plane with $\varphi\left(l,k\right)=\mathcal{O}\left(k\right)$ , and axis-parallel rectangles with $\varphi\left(l,k\right)=\mathcal{O}\left(lk^{2}\right)$ [18].

As is true for VC-dimension, there are algorithms that find hitting sets or $\epsilon$ -nets with sizes bounded in terms of the shallow cell complexity. A prominent example is the quasi-uniform sampling algorithm of Chan et al. [4]. Given non-negative weights $\mu:X\rightarrow\mathbb{R}_{\geq 0}$ , and a value $\epsilon>0$ , the algorithm finds a hitting set while maintaining an upper bound on the probability of selecting any given element.

Theorem 1.2 (Quasi-uniform sampling [4])

Suppose a set system defined by $A$ has SCC $\varphi\left(l,k\right)=\varphi\left(l\right)k^{c}$ for some $c>0$ . Then there is a randomized poly-time algorithm that returns a hitting set of expected size $\mathcal{O}\left(\max\{1,\log(\varphi\left(m\right))\}\right)$ times the LP optimum.

The algorithm attains the optimal approximation ratio with respect to the SCC333In addition, it is worth noting that this algorithm can solve the more general weighted hitting set problem, in which each element has a given weight, and the goal is to find the minimum weight hitting set.. However, the sampling procedure is involved, and may require enumeration over all sets $\mathcal{R}$ , of which there can be $n=\Omega(m^{c})$ for some constant $c>0$ [17].

Taking a different approach, Mustafa and colleagues [16, 17, 19] develop a net-finder for asymptotically optimal-sized unweighted $\epsilon$ -nets with respect to the SCC. The algorithm is remarkably simple: Take an initial sample from $X$ , and while there are unhit sets, choose an unhit set arbitrarily, and add $\mathcal{O}\left(1\right)$ randomly chosen elements from this set to the original sample. The algorithm assumes access to an oracle that returns an unhit set. This oracle is called at most $\mathcal{O}\left(1/\epsilon\right)$ times in expectation. While the size of the returned $\epsilon$ -net is asymptotically on par with the quasi-uniform sampling algorithm, there are large constants in the upper bound [17].

This algorithm is not directly applicable to the hitting set problem via the LP-reduction above, although it can be used via a standard reduction. The analysis of the algorithm applies to only uniform weights, and the optimal weights $\mu^{*}$ of the LP-formulation (1) are not generally uniform. Nevertheless, it is possible to reduce the problem of finding a weighted $\epsilon$ -net to that of finding a uniform $\epsilon^{\prime}$ -net following a standard reduction, in which an expanded instance is generated by copying each element $x_{j}\in X$ a number of times roughly proportional to its weight $\mu^{*}(x_{j})$ [3, 4]. This can generate $\Omega(m)$ copies of each element, which can have notable consequences. First, to achieve a weighted $\epsilon^{*}$ -net in the original instance, one must use a smaller value $\epsilon^{\prime}$ for the expanded instance, on the order of $\mathcal{O}\left(\epsilon^{*}/m\right)$ . This results in an approximation ratio of $\mathcal{O}\left(\log\varphi\left(\mathcal{O}\left(m\right)\right)\right)$ . Secondly, generating copies can increase the number of elements from $m$ to $\Omega(m^{2})$ . This can increase the runtime considerably. In particular, repeatedly sampling from sets of size $\Theta(m^{2})$ can become prohibitive on large instances such as the wireless coverage problem motivating our work.

1.2 Our Contributions

This paper generalizes the elegant net-finder algorithm of Mustafa [17] to the setting of weighted $\epsilon$ -nets, in order to produce a fast and simple algorithm for the hitting set problem, which attains asymptotically optimal approximation ratios with respect to the shallow cell complexity. The algorithm enjoys a faster runtime that makes solving larger instances, such as LoRaWAN receiver placement at scale, feasible. This is achieved by combining the weighted $\epsilon$ -net finder with the reduction of Even et al. [8]. In doing so, we also improve on the asymptotic approximation ratio from $\max\{1,\log\varphi\left(m\right)\}$ to $\max\{1,\mathcal{O}\left(\log\varphi\left(\mathcal{O}\left(z^{*}\right)\right)\right)\}$ where $z*$ is the optimal value to the linear relaxation of the hitting set program (1). While in the worst case $z^{*}=m$ , it is often the case that $z^{*}\ll m$ . However, the multiplicative constants in our analysis are relatively large, matching those of Mustafa [17]. In addition to the algorithm, our analysis generalizes the classic Packing Lemma of Haussler [11], as well as the Shallow Packing Lemma of Mustafa et al.. [19], to the weighted setting, which may be of independent interest.

Key to our approach are adaptations of Mustafa’s [19] Shallow Packing Lemma and Haussler’s [11] classic Packing Lemma that accommodate non-uniform weights. Our main technical contribution is to allow a notion of weighted packings. Consider any non-negative weights $\mu:X\rightarrow\mathbb{R}_{\geq 0}$ with $\sum_{x\in X}\mu(X)=1$ , and extend it to element subsets via $\mu(S)=\sum_{x\in S}\mu(S)$ .444Any non-negative weights $w:X\rightarrow\mathbb{R}_{\geq 0}$ with $w(X)>0$ can be normalized as $\mu(x)=w(x)/w(X)$ . A $(k,\delta)$ -packing with respect to weights $\mu$ is a collection of sets $\mathcal{P}\subseteq\mathcal{R}$ in which (i) all sets $R$ in $\mathcal{P}$ are at most $k$ -heavy, i.e., have bounded weight $\mu(R)\leq k$ ; and (ii) all pairs of sets have symmetric differences of weight at least $\delta$ . (See Definition 1). Our weighted shallow packing lemma upper bounds the number of sets in $\mathcal{P}$ as a function of the SCC. Our approach accommodates weights $\mu$ by sampling elements from a distribution with probability mass proportional to the weights, rather than from a uniform distribution as in the original proofs. Moreover, our proof uses sampling with replacement rather than without replacement to simplify the analysis. While more generally applicable, our result yields the same bound on the size of $\mathcal{P}$ as in the unweighted setting. An analogous sampling approach is used in proving Theorem 1.1 [14]. Equipped with our generalized lemma, it is straightforward to adapt Mustafa’s [17] analysis to a weighted net-finder. A proof of our Weighted Packing Lemma is included in the extended online version.

2 Algorithm and Main Result

Our algorithm combines the LP-relaxation of Even et al. [8] with the generalized sampling approach of Mustafa [17]. Our procedure is summarized in Algorithm 1. The algorithm makes use of two global constants, $\beta$ and $\gamma$ . These are assumed to be positive, and to satisfy $\gamma\leq 1/4$ and $\beta+\gamma\leq 1$ .

In the while loop, the weights $\mu^{*}(R)=\sum_{j:x_{j}\in R}\mu^{*}_{j}$ denote the weight of set $R$ under the LP optimal weights $\mu^{*}=(\mu^{*}_{1},\dots,\mu^{*}_{n})$ .

Conceptually, the algorithm is simple; it randomly selects an initial set of elements $H$ from $X$ , and proceeds to add additional random subsets of elements to $H$ until this is a hitting set. The algorithm relies on an oracle that returns an arbitrary unhit set. This oracle is treated as a black box. Our main result is twofold: we bound the expected size of the solution hitting set $H$ as a function of the cell complexity, and bound the expected number of oracle calls.

Theorem 2.1

Let $A$ be a binary matrix encoding a hitting set instance with shallow cell complexity $\varphi\left(\cdot,\cdot\right)$ and $\operatorname{VC-dim}(\mathcal{R})\leq d$ . Let $z^{*}$ be LP optimal value. Then the algorithm returns a hitting set of expected size

[TABLE]

Furthermore it makes $\mathcal{O}\left(z^{*}\right)$ oracle calls in expectation.

Note that the algorithm always returns a hitting set; the randomness is in the size of the solution and the runtime. This is in contrast with the net-finder in Theorem 1.1. Both algorithms require knowing the VC dimension $d$ ; ours must additionally know the shallow cell complexity $\varphi\left(\cdot,\cdot\right)$ . If unknown, these can be searched for using a standard doubling trick [17].

3 The Weighted Shallow Packing Lemma

The Weighted Shallow Packing Lemma is key to proving Theorem 2.1. This section formally defines weighted shallow packings, states the lemma, and proves it. To this end, fix non-negative weights $\mu$ over $X$ , and define the weight of a subset of elements $S\subseteq X$ as $\mu(S)=\sum_{j\in S}\mu_{j}$ . Assume that $\mu(X)=1$ . To contrast, let $\operatorname{card}\left({S}\right)$ denote the cardinality of any set $S$ . Note that the weights $\mu$ induce a probability distribution over the elements $X$ . Throughout, whenever an element $u$ of $X$ is randomly sampled, it is assumed to follow a distribution proportional to $\mu(\cdot)$ , in which case we say $u$ is sampled from $\mu(\cdot)$ , and denote this by $u\sim\mu(\cdot)$ . Note that an element $u\sim\mu(\cdot)$ sampled this way lies in subset $S\subseteq X$ with probability $\mu(S)$ .

The main purpose of the weighted shallow packing lemma is to bound the number of sets in a set system in terms of its shallow cell complexity. Clearly, an arbitrary set systems can contain large numbers of sets. Instead, we focus on a particular kind of set system called a weighted packing. A set system is a packing if all its sets are “light”, and each pair of sets are sufficiently different from each other. Critically, we define “light” and “different” in reference to the weights.

Definition 1

Let $(X,\mathcal{P})$ be a set system with weights $\mu$ , and let $k,\delta\in(0,1)$ be constants. If all sets $S$ in $\mathcal{P}$ satisfy $\mu(S)\leq k$ , and all pairs of distinct sets $S,R$ in $\mathcal{P}$ have symmetric difference of weight at least $\delta$ , i.e.

[TABLE]

then we say $(X,\mathcal{P})$ is a weighted $(k,\delta)$ -packing with respect to $\mu$ .

We omit the “with respect to $\mu$ ”-statement whenever this is clear from context.

The shallow packing lemma bounds the number of sets in a packing as a function of the constants $(k,\delta)$ , the VC-dimension, and the shallow cell complexity.

Lemma 1 (Weighted shallow packing lemma)

Let $(X,\mathcal{P})$ be a set system on $m$ elements, equipped with weights $\mu$ , and let $(X,\mathcal{P})$ be a $(k,\delta)$ -packing with respect to $\mu$ for constants $k,\delta>0$ . Assume the set system has $\operatorname{VC-dim}(\mathcal{P})\leq d$ , and shallow cell complexity $\varphi\left(\cdot,\cdot\right)$ . Then

[TABLE]

The proof to this lemma makes use of our weighted Packing Lemma. The unweighted Packing Lemma is a classic result by Haussler [11] that bounds the number of sets in a packing. We generalize this to nonuniform weights.

Lemma 2 (Weighted packing lemma)

Let $(X,\mathcal{P})$ be a set system with $n$ sets and $m$ elements, equipped with weights $\mu$ . Let $\operatorname{VC-dim}(\mathcal{P})\leq d$ for some integer $d\geq 1$ and assume there is a constant $\delta\in(0,1)$ such that $\mu(\Delta(S_{i},S_{k}))\geq\delta$ for all $1\leq i<k\leq n$ . Then

[TABLE]

where $Y$ is the set of unique elements in a random sample $U=(u_{1},u_{2},\dots,u_{s})$ of size $s=\lceil\frac{8d}{\delta}\rceil-1$ , in which each element $u_{k}$ is sampled iid $u_{k}\sim\mu(\cdot)$ with replacement.

The proof of the latter lemma is in the appendix to the extended online version of the paper; Lemma 1 is proved next.

3.1 Proof of the Weighted Shallow Packing Lemma

Proof

Fix a $(k,\delta)$ -packing $\mathcal{P}$ and let $U=(u_{1},u_{2},\dots,u_{s})$ be a random sample of length $s$ , in which each element is sampled $u_{k}\sim\mu(\cdot)$ , $k=1,\dots,s$ , independently and with replacement. The number of elements sampled is $s=\lceil\frac{8d}{\delta}\rceil-1$ . Let $Y\subseteq X$ be the set of unique elements in $U$ . For every set $R\in\mathcal{P}$ , let $M(R,U):=\sum^{s}_{i=k}\mathbf{1}[u_{k}\in R]$ denote number of (copies of) elements in $U$ that are in $R$ . Define $\mathcal{P}_{L}\subseteq\mathcal{P}$ as the sub-collection of “large” sets in packing $\mathcal{P}$ that contain at least $6\left(\frac{8dk}{\delta}\right)$ (copies of) elements in the random sample $U$ :

[TABLE]

It follows that the probability of a given range $R$ in $\mathcal{P}$ being a member of $\mathcal{P}_{L}$ is

[TABLE]

Our goal is to show that the collection of large sets $\mathcal{P}_{L}$ has few members in expectation. To do so, it suffices to bound the probability that a fixed set $R$ is a member of $\mathcal{P}_{L}$ . This is achieved using Markov’s inequality. Recalling that all sets $R\in\mathcal{P}$ have bounded weight $\mu(R)\leq k$ gives

[TABLE]

where we used the fact that we sample from $\mu(\cdot)$ , which implies that $\mathbb{P}[u_{k}\in R]=\mu(R)$ . Now, Markov’s inequality bounds the probability of $R$ being in $\mathcal{P}_{L}$ :

[TABLE]

Finally, because $\mathcal{P}_{L}\subseteq\mathcal{P}$ , we conclude that

[TABLE]

where the second-to-last inequality uses the shallow cell complexity of $\mathcal{P}$ ; the system $(Y,{(\mathcal{P}\backslash\mathcal{P}_{L})\rvert}_{Y})$ has at most $\operatorname{card}\left({Y}\right)\leq s$ elements, and sets have depth at most $\left(6\cdot\tfrac{8dk}{\delta}\right)$ , as the system consists only of cells that are not “large”. The final inequality holds because $\mathbb{P}[R\in\mathcal{P}_{L}]\leq 1/6$ . Finally, applying Lemma 2 completes the proof. $\square$

4 Proof of the Main Theorem

Equipped with the Weighted Shallow Packing Lemma, we follow a similar strategy as Mustafa [17]. We state and prove three key lemmas, and finally prove Theorem 2.1.

4.1 Key lemmas

The proof of our main theorem relies on all sets having similar weight. Let $\epsilon$ and $\mu=(\mu_{1},\dots,\mu_{n})$ be a feasible solution to the LP relaxation (1). By the constraints of the LP, each set $R\in\mathcal{R}$ has weight $\mu(R)=\sum_{j:x_{j}\in R}\mu_{j}\geq\epsilon$ . Partition the collection of sets $\mathcal{R}$ into groups $\ell=0,1,\dots,\lceil\log\epsilon\rceil$ of sets of similar weight; set $R$ belongs to group $\ell$ if and only if $2^{-\ell-1}\epsilon\leq\mu(R)<2^{-\ell}\epsilon$ . Because the algorithm exclusively takes independent samples, we can view one run of the algorithm as multiple parallel, independent runs on each group of sets. All our bounds scale on the order $\mathcal{O}\left(1/(2^{-\ell}\epsilon)\right)$ , so summing over the groups gives a final bound on the order of $\mathcal{O}\left(1/\epsilon\right)$ . Hence, we assume henceforth that all sets $R\in\mathcal{R}$ have weight $\epsilon\leq\mu\left(R\right)\leq 2\epsilon$ .

The key idea of the proof is to amortize the elements added from each processed unhit set throughout the run of the algorithm. We say a set is processed each time it is flagged as unhit by the oracle, and a sample is taken from it. We bound the total number of elements sampled using weighted $(k,\delta)$ -packings on two levels. The first-level packing is an arbitrary maximal packing $\mathcal{P}$ of sets in $\mathcal{R}$ . There are a bounded number of sets in $\mathcal{P}$ . Next, each processed set $R_{i}$ is assigned to a set in the first-level packing $\mathcal{P}$ . For a fixed set $P^{j}$ in the first-level packing, given that it has been assigned processed sets, we show that the collection of sets $R_{i}$ assigned to $P^{j}$ forms a second-level packing. Each second-level packing also has a bounded number of sets. Finally, by bounding the probability that a set in the first-level packing has any sets assigned to it, the total expected number of times the algorithm processes a set is bounded. Note that the assignments are only a tool for analysis; they need not be computed by the algorithm.

We begin by defining the first-level packing. Fix a maximal $(2\epsilon,\beta\epsilon)$ -packing $\mathcal{P}=\{P^{1},\dots,P^{p}\}$ , where $p$ denotes the number of sets in the packing. The Shallow Packing Lemma 1 upper bounds the number of sets in the packing by

[TABLE]

Now, suppose the algorithm runs for $T$ steps, processing sets $(R_{1},\dots,R_{T})$ in sequence. One given set may be processed multiple times. Denote the sets of sampled elements $H_{R_{1}},\dots,H_{R_{T}}$ . The processed sets $R_{i}$ are assigned to sets $P^{j}$ in the first-level packing $\mathcal{P}$ as follows. Arbitrarily assign each set $R_{i}$ to any index $j\in\{1,2,\dots,p\}$ satisfying $\mu\left(\Delta(R_{i},P^{j})\right)<\beta\epsilon$ . Such an index $j$ exists because $\mathcal{P}$ is a maximal $(2\epsilon,\beta\epsilon)$ -packing. It may be the case that $R_{i}=P^{j}$ . The next task is to bound the number of sets $R_{i}$ assigned to any set $P^{j}$ in the first-level packing.

Let $n_{j}$ denote the number of processed sets in $(R_{1},\dots,R_{T})$ assigned to $P^{j}\in\mathcal{P}$ . For now, condition on first-level packing set $P^{j}$ having at least one set assigned to it, i.e. $n_{j}\geq 1$ . We study the probability of this event later. Relabel the sets and consider them in the order in which they were processed by the algorithm,

[TABLE]

Claim

For all $j\in\{1,2,\dots,p\}$ , $i\in\{1,2,\dots,n_{j}\}$ we have

[TABLE]

Proof. Fix $j\in\{1,2,\dots,p\}$ . For all $i\in\{1,2,\dots,n_{j}\}$ we have

[TABLE]

The first equality follows from straightforward accounting, and the second from the definition of symmetric difference. The inequality follows from the manner in which set $R^{j}_{i}$ is matched to the packing set $P^{j}$ . Finally, a simple rearrangement of terms yields the result. $\square$

This proves that the intersection of each set $R^{j}_{i}$ with its corresponding first-level packing set $P^{j}$ is heavy. This lets us define a second-level packing using the intersections $R^{j}_{i}\cap P^{j}$ .

Rather than directly bounding the the number of processed sets assigned to a first-level packing set, it is easier to first bound the length of a random subsequence of the assigned sets $\mathcal{S}^{j}$ . For any $j\in\{1,\dots,p\}$ , define the subsequence $\mathcal{S}^{\prime j}$ as the subsequence of processed sets $R$ in $\mathcal{S}^{j}$ whose corresponding samples $H_{R}$ form $\gamma$ -nets for the system $(R,{\mathcal{R}\rvert}_{R})$ :

[TABLE]

We proceed by bounding the length of the subsequence $\mathcal{S}^{\prime j}$ , and by choosing $\gamma$ so as to make it likely for a set $R$ in $\mathcal{S}^{j}$ to be in $\mathcal{S}^{\prime j}$ , using the $\epsilon$ -net Theorem 1.1. We use this to upper bound the expected number of sets in $\mathcal{S}^{j}$ . Let $\operatorname{len}\left({\mathcal{S}}\right)$ denote the length of a sequence $\mathcal{S}$ .

The following claim bounds the length of the subsequence above.

Claim

For any $j\in\{1,2,\dots,p\}$ :

[TABLE]

Proof. Let $n^{\prime}_{j}=\operatorname{len}\left({\mathcal{S}^{\prime j}}\right)$ and relabel the sets so that $\mathcal{S}^{\prime j}=\left(R^{j}_{1},\dots,R^{j}_{n^{\prime}_{j}}\right)$ . Now consider an auxiliary sequence of sets based on intersecting the entries $R^{j}_{i}$ in $\mathcal{S}^{\prime j}$ with $P^{j}$ :

[TABLE]

This sequence of sets is used to generate a second-level packing. To do this, consider two distinct set-indices $1\leq k<l\leq n^{\prime}_{j}$ . The points $H_{R^{j}_{k}}$ are added before set $R^{j}_{l}$ is considered, so $H_{R^{j}_{k}}$ is a $\gamma$ -net for $\left(R^{j}_{k},{\mathcal{R}\rvert}_{R^{j}_{k}}\right)$ , whereas the set $R^{j}_{l}$ – because it is subsequently considered by the algorithm – is not hit by this net. This implies that the intersection of $R^{j}_{k}$ and $R^{j}_{l}$ is of bounded weight, as it would be hit by the $\gamma$ -net otherwise:

[TABLE]

This implies that the weight of the intersection of $S^{j}_{k}$ and $S^{j}_{l}$ is bounded above:

[TABLE]

The fact that sets in $\mathcal{T}^{\prime j}$ have pairwise intersections of small weight implies that their symmetric differences are heavy:

[TABLE]

where the first inequality uses Eq. 6, the second Eq. 8, and the last exploits the fact that sets $R^{j}_{k}$ , $R^{j}_{l}$ , and $P^{j}$ are each of measure at least $\epsilon$ and at most $2\epsilon$ , and that $\gamma\leq 1/4$ . Thus, depending on the constants, the sequence $\mathcal{T}^{\prime j}$ may form a weighted packing.

Finally, reviewing two cases for the constants $\beta$ and $\gamma$ makes the above more precise. First, if $\beta+\gamma<1/2$ , the inequality above implies that the symmetric difference of $S^{j}_{k}$ and $S^{j}_{l}$ is strictly larger than $\mu\left(P^{j}\right)$ . This cannot be the case as both sets are subsets of $P^{j}$ . Thus, the only sequence $\mathcal{S}^{\prime j}$ for which $\beta+\gamma$ can be less than a half is if there are no two unique indices, implying that $\operatorname{len}\left({\mathcal{S}^{\prime j}}\right)\leq 1$ . Secondly, if $\beta+\gamma\geq 1/2$ , the sets in $\mathcal{T}^{\prime j}$ form a $\left(\mu\left(P^{j}\right),(3/2-\beta-\gamma)\mu\left(P^{j}\right)\right)$ -packing over $P^{j}$ ; all sets have measure at most $\mu\left(P^{j}\right)$ , and every symmetric difference is at least $(3/2-\beta-\gamma)\mu\left(P^{j}\right)$ . This is our second-level packing. Now, the Shallow Packing Lemma 1 implies:

[TABLE]

where we have used the fact that $\varphi\left(\cdot,\cdot\right)$ is non-decreasing and that $\mu(P^{j})\leq 1$ . $\square$

We can now bound the length of the full sequence of sets assigned to the packing set $P^{j}$ . Taking expectations sidesteps any dependencies in the sequences. For instance, a set $R$ can only be in $\mathcal{S}^{j}$ if previous samples failed to hit it. However, for each fixed set $R\in\mathcal{R}$ , the probability of the sampled points $H_{S}$ forming a $\gamma$ -net for $(R,{R\rvert}_{\mathcal{R}})$ is independent of previous sampling. Indeed, by Theorem 1.1, the probability that $H_{R}$ is a $\gamma$ -net is at least $1-\gamma\geq 1/2$ .

Lemma 3 (Mustafa, Lemma 5 [17])

[TABLE]

Proof. We use a simple application of linearity of expectation, and Theorem 1.1:

[TABLE]

where we drop the conditioning on $n_{j}\geq 1$ because the event that a particular sample $H_{R}$ is a $\gamma$ -net is independent of the number of previous samples. On the other hand, Eq. 7 upper bounds the size of $\operatorname{len}\left({\mathcal{S}^{\prime j}}\right)$ . Piecing these together yields the inequality:

[TABLE]

$\square$

Thus far we have conditioned on a set in the first-level packing being assigned at least one processed set. We now bound the probability of this being the case. Later, this probability is used to compute the expected number of processed sets assigned to a first-level packing set.

Lemma 4

Let $H_{0}$ be the initial sample taken by the algorithm. Then for any $j\in\{1,\dots,p\}$ :

[TABLE]

Proof. Fix an index $j\in\{1,\dots,p\}$ . Suppose that $n_{j}\geq 1$ . By Eq. 6, for any $i\in\{1,\dots,n_{j}\}$ :

[TABLE]

The second inequality above follows from the assumption that all sets have weights within a factor 2 of each other. The above implies that, if $H_{0}$ is a $\left(\frac{3}{4}-\frac{\beta}{2}\right)$ -net for $\left(P^{j},{\mathcal{R}\rvert}_{P^{j}}\right)$ , then any $R\in\mathcal{S}^{j}$ would be hit by $H_{0}$ . In other words, $n_{j}\geq 1$ only if $H_{0}$ is not a $\left(\frac{3}{4}-\frac{\beta}{2}\right)$ -net for $\left(P^{j},{\mathcal{R}\rvert}_{P^{j}}\right)$ :

[TABLE]

Beacuse $\tfrac{\mu_{j}}{\epsilon}\geq\tfrac{\mu_{j}}{\mu(R)}$ , the initial sample includes each element with sufficient probability to apply Theorem 1.1 to the RHS above, completing the proof. $\square$

4.2 Proof of Theorem 2.1

Proof of Theorem 2.1. At this stage, the analysis closely follows Mustafa’s [17]. Clearly, the algorithm proceeds until $H$ is an $\epsilon^{*}$ -net with respect to measure $\mu^{*}$ , i.e., a hitting set. It suffices to bound the expected size of the hitting set $H$ , as well as the expected number of oracle calls. These quantities are related, since the number of points added depends on the number of times a set is processed.

First, consider the expected size of the hitting set. There are two contributions to the set: the initial sample $H_{0}$ , and the samples from the processed sets $H_{R_{1}},\dots,H_{R_{T}}$ . We bound the expected size of the initial sample first.

Claim

The expected size of the initial sample, $\mathbb{E}[\operatorname{card}\left({H_{0}}\right)]$ is bounded by

[TABLE]

This follows by summing the probability of sampling $x$ for each $x\in X$ . An analogous result is used for the number of points added during the processing of a set $R\in\mathcal{R}$ , provided it is processed:

Claim

For any fixed set $R\in\mathcal{R}$ , conditional on being processed, the expected number of points added each time it is processed is

[TABLE]

This bound applies irrespective of whether or not a set was processed previously.

The number of points added during processing, and the number of oracle calls, can be bounded together. Recalling that $R_{1},\dots,R_{T}$ are the processed sets, and using the claim above, the number of added elements is at most

[TABLE]

Thus, it suffices to bound the expected number of oracle calls $\mathbb{E}[T]$ . This is where we employ both the first-, and second-level packings. In particular

[TABLE]

The terms (i), (ii) and (iii) are bounded using Eq. 5, Lemma 3, and Lemma 4, respectively. In addition, using $\tfrac{3}{2}-\beta-\gamma\geq\tfrac{1}{2}\geq\max\{\beta\epsilon,\beta/2\}$ :

(i)

$p\leq\frac{24d}{\beta\epsilon}\varphi\left(\frac{8d}{\beta\epsilon},\frac{24d}{\beta}\right)$ 2. (ii)

$\mathbb{E}\left[\operatorname{len}\left({\mathcal{S}^{j}}\right)\big{\rvert}n_{j}\geq 1\right]\leq\frac{48d}{3/2-\beta-\gamma}\varphi\left(\frac{8d}{3/2-\beta-\gamma},\frac{24d}{3/2-\beta-\gamma}\right)\leq\frac{48d}{3/2-\beta-\gamma}\varphi\left(\frac{8d}{\beta\epsilon},\frac{24d}{\beta}\right)$ ; 3. (iii)

$\mathbb{P}[n_{j}\geq 1]\leq\left(d^{2}\varphi\left(\frac{8d}{\beta\epsilon},\frac{24d}{\beta}\right)^{2}\right)^{-1}$ .

Combining the right-hand-side terms, we obtain the bound

[TABLE]

This is minimized by choosing a small $\gamma$ , e.g. $\gamma=1/100$ , and setting $\beta=3/4$ .

Finally, summing over the $\ell$ groups of sets, and adding the expected number of initial samples to the expected number added points completes the proof. Note that Eq. 13 also bounds the expected number of oracle calls made during the run of the algorithm. $\square$

Appendix 0.A Proof of the Weighted Packing Lemma

This section proves the Weighted Packing Lemma 2. Our proof closely follows the original in Haussler [11]. The reader is referred to Matoušek [15] (Sec 5.3) for an excellent treatment of the unweighted proof. We begin by restating the weighted lemma.

Lemma 5 (Weighted Packing Lemma)

Let $(X,\mathcal{P})$ be a set system with $m$ elements and $n$ sets $\mathcal{P}=\{S_{1},\dots,S_{n}\}$ , equipped with weights $\mu:X\rightarrow\mathbb{R}_{\geq 0}$ with $\mu(X)>0$ . Let $\operatorname{VC-dim}(\mathcal{P})\leq d$ for some integer $d\geq 1$ , and let $\delta>0$ be a constant such that $\mu(\Delta(S_{i},S_{j}))\geq\delta$ for all $1\leq i<j\leq n$ . Then

[TABLE]

where $Y$ is the set of unique elements in an random sample $U=(u_{1},u_{2},\dots,u_{s})$ of size $s=\lceil\frac{8d}{\delta}\rceil-1$ , in which each element $u_{k}$ is sampled iid $u_{k}\sim\mu(\cdot)$ with replacement.

The proof strategy is the following. First, we consider a random sample $U=(u_{1},\dots,u_{s})$ sampled from $\mu(\cdot)$ with replacement. The sample $U$ may contain repeated elements; let $Y\subseteq X$ denote the set of unique elements in $U$ . Next, we generate a unit-distance graph; a weighted graph that depends on the random set $Y$ , and derive three claims about the total weight of its edges: (i) an upper bound on the total weight, (ii) a partial lower bound, and (iii) a complete lower bound. Combining the bounds completes the proof. The main differences between our approach and that of Chazelle, Haussler, and Mustafa are twofold [5, 11, 16]. Firstly, we permit non-uniform weights $\mu$ as opposed to weighing each set by its cardinality, and for we sample from a probability distribution proportional to $\mu$ . Secondly we use sampling with replacement as opposed to without replacement. This makes the analysis more straightforward under non-uniform sampling using $\mu$ .

A weighted unit-distance graph over the sampled set-system takes a central stage in the proof. In this graph, sets are viewed as vertices, and edges are drawn between any two sets at unit-distance of each other.

Definition 2 (Unit distance graph)

Let $(X,\mathcal{P})$ be a set system. The unit distance graph of $(X,\mathcal{P}$ ) is a graph $G=(\mathcal{P},E_{\mathcal{P}})$ with vertex set $\mathcal{P}$ and edges

[TABLE]

In other words, edges represents pairs of sets that differ on exactly one element. The following lemma connects the number of edges with the VC-dimension.

Lemma 6 (Haussler [11])

Fix a set system $(X,\mathcal{P})$ with $\operatorname{VC-dim}(\mathcal{P})=d$ . Let $G=(\mathcal{P},E_{\mathcal{P}})$ be its unit distance graph. Then $\operatorname{card}\left({E_{\mathcal{P}}}\right)\leq d\operatorname{card}\left({\mathcal{P}}\right)$ .

We use this lemma to bound the total edge weight in our unit-distance graph. But first, we define our particular unit-distance graph and its edge weights.

We construct a graph that depends on the random set $Y\subseteq X$ . Consider the projection of $\mathcal{P}$ to $Y$ , denoted ${\mathcal{P}\rvert}_{Y}$ . Let $G_{Y}=({\mathcal{P}\rvert}_{Y},E_{{\mathcal{P}\rvert}_{Y}})$ be a unit distance graph over the projected system. For each vertex $S^{\prime}\in{\mathcal{P}\rvert}_{Y}$ , define the vertex weight as the number of sets $S\in\mathcal{P}$ that are projected to $S^{\prime}$ in ${\mathcal{P}\rvert}_{Y}$ , that is

[TABLE]

Moreover, define the weight of an edge $\{S^{\prime}_{i},S^{\prime}_{j}\}\in E_{{\mathcal{P}\rvert}_{Y}}$ as the minimum over the weights of its two vertices, $w\left(\{S^{\prime}_{i},S^{\prime}_{j}\}\right)=\min\{w(S^{\prime}_{i}),w(S^{\prime}_{j})\}$ . Finally, let the total edge weight be $W=\sum_{e\in E_{{\mathcal{P}\rvert}_{Y}}}w(e)$ . Note that the weights are random variables because they depend on the random selection $Y$ . We proceed by bounding the total edge weight. This will allow us to bound the size of the packing in a way that “looks like a magician’s trick” [15].

First, we find an upper bound for the total edge weight. This is naturally also an upper bound on the expected total edge weight.

Claim

The total edge weight is upper-bounded by $W\leq 2d\operatorname{card}\left({\mathcal{P}}\right)$ .

Proof. This is the proof of Haussler and Chazelle [11, 5]. Lemma 6 implies that:

[TABLE]

Hence there exists a vertex $S^{\prime}$ in ${\mathcal{P}\rvert}_{Y}$ with degree at most $2d$ . Each edge incident to $S^{\prime}$ has weight at most $w(S^{\prime})$ , so the vertex $S^{\prime}$ is responsible for edges of total weight at most $2dw(S^{\prime})$ . Applying this inductively proves the bound on the total weight.

[TABLE]

$\square$

Next we derive a lower bound. It suffices to derive the bound for a reduced problem. Let $U^{\prime}$ be the subsequence of sample $U=(u_{1},\dots,u_{s})$ containing all but the last element $u_{s}$ . Similarly, let $Y^{\prime}$ be the set of unique elements in the subsample $U^{\prime}$ , just as $Y$ denotes the unique elements in the full sample $U$ . Now, when the final element $u_{s}$ is added to $U^{\prime}$ , some vertices in ${\mathcal{P}\rvert}_{Y}$ may form an edge due to $u_{s}$ . This occurs exactly when the last sampled element $u_{s}$ falls in the symmetric difference of two sets that were previously equal in the projection ${\mathcal{P}\rvert}_{Y^{\prime}}$ . Let $W_{s}$ be the sum of the weights of the edges due to element $u_{s}$ . In other words, the weight $W_{s}$ is the weight of the edges generated by adding a random element $u_{s}\sim\mu(\cdot)$ to the random sequence $U^{\prime}=(u_{1},\dots,u_{s-1})$ . Because the samples are iid, given a squence, the ordering of the random elements in the sequence is uniform over all permutations, so we can apply symmetry of expectation:

[TABLE]

Hence, it suffices to derive an upper bound on $\mathbb{E}[W_{s}]$ .

Following Haussler [15], an intermediate step is to lower bound the expectation of the weight $W_{s}$ due to the $s$ th element $u_{s}$ conditional on the previous $s-1$ elements $U^{\prime}$ . The lower bound follows from considering pairs of vertices that lack edges in $E_{{\mathcal{P}\rvert}_{Y^{\prime}}}$ but that may share an edge after adding $u_{s}$ . This happens exactly when the new element $u_{s}$ falls in their symmetric difference. This event occurs with probability at least $\delta$ , because we assume $\mu(\Delta(S,))\geq\delta$ for all pairs $S\neq R$ in our packing $\mathcal{P}$ . This argument uses the fact that we sample elements proportional to the weight $\mu$ .

Claim

For any set $Y^{\prime}\subseteq X$ generated by a fixed sequence $u^{\prime}=(u_{1},\dots,u_{s-1})$ , with $u_{k}\in X$ for $k=1,\dots,s-1$ , it holds that

[TABLE]

where $\mathbb{E}_{\mu}$ denotes expectation over is over a single random element $u_{s}\sim\mu(\cdot)$ .

Proof. Fix a set $Y^{\prime}\subseteq X$ corresponding to the unique elements in the partial selection $u^{\prime}=(u_{1},\dots,u_{s-1})$ . Consider an arbitrary set $Q$ in the projection ${\mathcal{P}\rvert}_{Y^{\prime}}$ . There may be many sets in $\mathcal{P}$ that map to $Q$ in ${\mathcal{P}\rvert}_{Y^{\prime}}$ . Let $\mathcal{P}_{Q}$ be the collection of these sets, and let $b$ denote the number of such sets. Note that for any pair of sets $S_{i},S_{j}\in\mathcal{P}_{Q}$ , $Y^{\prime}$ cannot contain any element in the symmetric difference $\Delta(S_{i},S_{j})$ , or else the two sets would not map to the same $Q$ . However, when an additional element $u_{s}$ is sampled and added to $Y^{\prime}$ (with the possibility that $u_{s}$ is already in $Y^{\prime}$ ) the collection $\mathcal{P}_{Q}$ is partitioned into two groups: (i) sets that contain $u_{s}$ , and (ii) sets that do not contain $u_{s}$ . Let $b_{1}$ and $b_{2}$ denote the number of sets in these groups, respectively, with $b=b_{1}+b_{2}$ . These two groups are at a unit-distance in ${\mathcal{P}\rvert}_{Y}$ . The weight of the resulting edges in the unit-distance graph is exactly $\min\{b_{1},b_{2}\}$ .

By adding up the expected weights due to each pair we get $\mathbb{E}[W_{s}]$ . For every pair $S^{\prime}_{1},S^{\prime}_{2}\in\mathcal{P}_{Q}$ , the probability that $u_{s}$ hits $\Delta(S^{\prime}_{1},S^{\prime}_{2})$ is $\mu(\Delta(S^{\prime}_{1},S^{\prime}_{2}))\geq\delta$ . Thus, the expected contribution of each pair to the product $b_{1}b_{2}$ is at least $\delta$ . Note that the sum $b_{1}+b_{2}=b$ depends on $Y^{\prime}$ , however is independent of $u_{s}$ . Hence

[TABLE]

The first inequality follows from the fact that $\min\{b_{1},b_{2}\}\geq b_{1}b_{2}/b$ . The first equality uses the fact that there $\mathcal{R}_{Q}$ is partitioned into two groups, each at unit distance, so $b_{1}b_{2}$ is the total number of edges between the two groups. Taking the above inequality and summing over all vertices $Q\in{\mathcal{P}\rvert}_{Y^{\prime}}$ gives the result

[TABLE]

$\square$

By using the claim above it is now straightforward to produce a lower bound on the expected total edge weight.

Claim

$\mathbb{E}[W]\geq 4dn-4d\mathbb{E}[\operatorname{card}\left({{\mathcal{P}\rvert}_{Y^{\prime}}}\right)]$ .

Proof. We employ the reduction from above as well as the partial lower bound. Let $X^{k}$ denote the Cartesian product of element set $X$ . For any length sequence $u^{\prime}=(u_{1},\dots,u_{s-1})$ of elements in $X^{s-1}$ , let $Y^{\prime}(u^{\prime})\subseteq X$ be the set of unique elements in $u^{\prime}$ . Analogously let let $Y^{\prime}(U^{\prime})$ denote the random set of unique elements in a random sequence $U^{\prime}$ in $X^{s-1}$ . It then follows that:

[TABLE]

where the first inequality uses the partial lower bound, and the last equality follows from the definition the random set $Y(U^{\prime})$ . $\square$

Finally, all the pieces are in place to prove the Packing Lemma. Using the lower and upper bounds it follows that cardinality of $\mathcal{P}$ is bounded above by twice the expected size of ${\mathcal{P}\rvert}_{Y^{\prime}}$ :

[TABLE]

This yields the statement of the weighted packing lemma, completing the proof. $\square$

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Aronov, B., Ezra, E., Sharir, M.: Small-size ε 𝜀 \varepsilon -nets for axis-parallel rectangles and boxes. In: Proceedings of the Forty-First Qnnual ACM Symposium on Theory of Computing. pp. 639–648 (2009)
2[2] Bar-Yehuda, R., Even, S.: A linear-time approximation algorithm for the weighted vertex cover problem. Journal of Algorithms 2 (2), 198–203 (1981)
3[3] Brönnimann, H., Goodrich, M.T.: Almost Optimal Set Covers in Finite VC-Dimension. Discrete Comput. Geom. 14 , 263–279 (1995)
4[4] Chan, T.M., Grant, E., Könemann, J., Sharpe, M.: Weighted capacitated, priority, and geometric set cover via improved quasi-uniform sampling. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms. p. 1576–1585. SODA ’12, Society for Industrial and Applied Mathematics, USA (2012)
5[5] Chazelle, B.: A note on Haussler’s packing lemma (1992), a note on Haussler’s packing lemma
6[6] Chvatal, V.: A greedy heuristic for the set-covering problem. Mathematics of Operations Research 4 (3), 233–235 (1979)
7[7] Clarkson, K.L.: A randomized algorithm for closest-point queries. SIAM Journal on Computing 17 (4), 830–847 (1988)
8[8] Even, G., Rawitz, D., Shahar, S.M.: Hitting sets when the VC-dimension is small. Information Processing Letters 95 (2), 358–362 (2005)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Hitting Sets when the Shallow Cell Complexity is Small

Abstract

Keywords:

1 Introduction

Theorem 1.1 (ϵ\epsilonϵ-net Theorem [12, 14])

1.1 Shallow Cell Complexity

Theorem 1.2 (Quasi-uniform sampling [4])

1.2 Our Contributions

2 Algorithm and Main Result

Theorem 2.1

3 The Weighted Shallow Packing Lemma

Definition 1

Lemma 1 (Weighted shallow packing lemma)

Lemma 2 (Weighted packing lemma)

3.1 Proof of the Weighted Shallow Packing Lemma

Proof

4 Proof of the Main Theorem

4.1 Key lemmas

Claim

Claim

Lemma 3 (Mustafa, Lemma 5 [17])

Lemma 4

4.2 Proof of Theorem 2.1

Claim

Claim

Appendix 0.A Proof of the Weighted Packing Lemma

Lemma 5 (Weighted Packing Lemma)

Definition 2 (Unit distance graph)

Lemma 6 (Haussler [11])

Claim

Claim

Claim

Theorem 1.1 ( $\epsilon$ -net Theorem [12, 14])