Adversarially Robust Submodular Maximization under Knapsack Constraints

Dmitrii Avdiukhin; Slobodan Mitrovi\'c; Grigory Yaroslavtsev; Samson; Zhou

arXiv:1905.02367·cs.DS·May 8, 2019

Adversarially Robust Submodular Maximization under Knapsack Constraints

Dmitrii Avdiukhin, Slobodan Mitrovi\'c, Grigory Yaroslavtsev, Samson, Zhou

PDF

Open Access

TL;DR

This paper introduces the first scalable adversarially robust algorithms for monotone submodular maximization under knapsack constraints, demonstrating strong empirical performance on social network and recommendation datasets.

Contribution

It presents novel scalable algorithms for robust submodular maximization under multiple knapsack constraints, with theoretical guarantees and practical effectiveness.

Findings

01

Algorithms achieve near-optimal robust solutions with polylogarithmic factors.

02

Strong empirical performance on social network and recommendation datasets.

03

Outperforms or matches existing non-robust algorithms in robustness and objective value.

Abstract

We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution can be constructed. For multiple knapsack constraints, our approximation is within a constant-factor of the best known non-robust solution. We evaluate the performance of our algorithms by comparison to natural robustifications of existing non-robust algorithms under two objectives: 1) dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), 2) movie recommendations on a dataset from MovieLens. Experimental results…

Figures7

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Sizes of robust summaries produced by the algorithms ( K = 10 𝐾 10 K=10 ).

	ml-20, 1 knapsack	fb, 1 knapsack	twitter, 1 knapsack	ml-20, 2 knapsacks	fb, 2 knapsacks	twitter, 2 knapsacks
AlgMult	641	378	401	1350	2745	4208
MarginalRatio	641	377	402	1350	2745	4209
Multidimensional	87	18	435	72	22	4221
Greedy	647	393	493	-	-	-

Equations94

O P T (V) = S \subseteq V : C x_{S} \leq b argmax f (S) .

O P T (V) = S \subseteq V : C x_{S} \leq b argmax f (S) .

O P T (V ∖ E) = S \subseteq S ∖ E : C x_{S} \leq b argmax f (S)

O P T (V ∖ E) = S \subseteq S ∖ E : C x_{S} \leq b argmax f (S)

f_{X} (Z) = x \in X \sum z \in Z max ⟨ v_{z}, v_{x} ⟩,

f_{X} (Z) = x \in X \sum z \in Z max ⟨ v_{z}, v_{x} ⟩,

c (x) = 1 + 0.5 \times (ba d (x) - g oo d (x) + t),

c (x) = 1 + 0.5 \times (ba d (x) - g oo d (x) + t),

f (Z_{τ}) \geq (1 - \frac{1}{e}) (1 - \frac{1}{2 ℓ}) τ .

f (Z_{τ}) \geq (1 - \frac{1}{e}) (1 - \frac{1}{2 ℓ}) τ .

c (B_{i^{*}, j^{*}} \cap E) \leq \frac{2 ^{i^{*}} \cdot I}{t _{i *} \cdot w ⌈ K / 2 ^{i^{*}} ⌉ + 4 ℓ I} < \frac{2 ^{i^{*}}}{4 ℓ} .

c (B_{i^{*}, j^{*}} \cap E) \leq \frac{2 ^{i^{*}} \cdot I}{t _{i *} \cdot w ⌈ K / 2 ^{i^{*}} ⌉ + 4 ℓ I} < \frac{2 ^{i^{*}}}{4 ℓ} .

f (B_{i^{*}, j^{*}} ∖ E) \geq (c (B_{i^{*}, j^{*}}) - \frac{t _{i *}}{4 ℓ}) \frac{τ}{c ( B _{i^{*}, j^{*}} )} \geq (1 - \frac{1}{2 ℓ}) τ .

f (B_{i^{*}, j^{*}} ∖ E) \geq (c (B_{i^{*}, j^{*}}) - \frac{t _{i *}}{4 ℓ}) \frac{τ}{c ( B _{i^{*}, j^{*}} )} \geq (1 - \frac{1}{2 ℓ}) τ .

f (E_{i} ∣ A_{i - 1}) < \frac{τ}{2 ^{i - 1}} c (E_{i}),

f (E_{i} ∣ A_{i - 1}) < \frac{τ}{2 ^{i - 1}} c (E_{i}),

f (E_{i} ∣ A_{i - 1}) \leq e \in E_{i} \sum f (e ∣ A_{i - 1}) .

f (E_{i} ∣ A_{i - 1}) \leq e \in E_{i} \sum f (e ∣ A_{i - 1}) .

f (E_{ℓ} j = 0 ⋃ ℓ - 1 (A_{j} ∖ E_{j})) \leq j = 1 \sum ℓ \frac{τ}{2 ^{j - 1}} c (E_{j}) .

f (E_{ℓ} j = 0 ⋃ ℓ - 1 (A_{j} ∖ E_{j})) \leq j = 1 \sum ℓ \frac{τ}{2 ^{j - 1}} c (E_{j}) .

f (E_{i} j = 0 ⋃ i - 1 (A_{j} ∖ E_{j})) \leq j = 1 \sum i \frac{τ}{2 ^{j - 1}} c (E_{j}) .

f (E_{i} j = 0 ⋃ i - 1 (A_{j} ∖ E_{j})) \leq j = 1 \sum i \frac{τ}{2 ^{j - 1}} c (E_{j}) .

f (E_{1}) \leq \frac{τ}{2 ^{0}} = j = 1 \sum 1 \frac{τ}{2 ^{j - 1}} c (E_{j}),

f (E_{1}) \leq \frac{τ}{2 ^{0}} = j = 1 \sum 1 \frac{τ}{2 ^{j - 1}} c (E_{j}),

f (E_{i} j = 0 ⋃ i - 1 (A_{j} ∖ E_{j})) \leq f (E_{i} \cup j = 0 ⋃ i - 1 (A_{j} ∖ E_{j}))

f (E_{i} j = 0 ⋃ i - 1 (A_{j} ∖ E_{j})) \leq f (E_{i} \cup j = 0 ⋃ i - 1 (A_{j} ∖ E_{j}))

+ f (E_{i - 1} j = 0 ⋃ i - 2 (A_{j} ∖ E_{j})) - f (E_{i - 1} \cup j = 0 ⋃ i - 1 (A_{j} ∖ E_{j})) .

f (A \cup B) - f (A \cup (B ∖ R)) \leq f (R ∣ A) .

f (A \cup B) - f (A \cup (B ∖ R)) \leq f (R ∣ A) .

f (Z_{τ}) \geq \frac{1}{15} (1 - \frac{1}{e}) (f (B_{ℓ, r}) - \frac{τ}{2}),

f (Z_{τ}) \geq \frac{1}{15} (1 - \frac{1}{e}) (f (B_{ℓ, r}) - \frac{τ}{2}),

f (i = 0 ⋃ ℓ (A_{i} ∖ E_{i})) \geq f (B_{ℓ, r}) - i = 1 \sum ℓ \frac{τ}{2 ^{i - 1}} c (E_{i}) .

f (i = 0 ⋃ ℓ (A_{i} ∖ E_{i})) \geq f (B_{ℓ, r}) - i = 1 \sum ℓ \frac{τ}{2 ^{i - 1}} c (E_{i}) .

c (E_{i}) \leq \frac{2 c ( E ~ _{i} )}{w ⌈ K / 2 ^{i} ⌉ + \frac{β \cdot c ( E ~ _{i} )}{2 ^{i}}} \leq \frac{2 c ( E ~ _{i} )}{w ⌈ K / 2 ^{i} ⌉ + \frac{β \cdot c ( E ~ _{i} )}{2 ^{i}}} .

c (E_{i}) \leq \frac{2 c ( E ~ _{i} )}{w ⌈ K / 2 ^{i} ⌉ + \frac{β \cdot c ( E ~ _{i} )}{2 ^{i}}} \leq \frac{2 c ( E ~ _{i} )}{w ⌈ K / 2 ^{i} ⌉ + \frac{β \cdot c ( E ~ _{i} )}{2 ^{i}}} .

i = 1 \sum ℓ \frac{τ}{2 ^{i - 1}} c (E_{i})

i = 1 \sum ℓ \frac{τ}{2 ^{i - 1}} c (E_{i})

\leq (i = 1 \sum ℓ \frac{τ}{2 ^{i - 1}} \frac{2 ^{i + 1} c ( E ~ _{i} )}{w K + β \cdot c ( E ~ _{i} )}) .

i = 1 \sum ℓ \frac{τ}{2 ^{i - 1}} c (E_{i}) \leq (i = 1 \sum ℓ \frac{4 τ α _{i}}{w + β α _{i}}) .

i = 1 \sum ℓ \frac{τ}{2 ^{i - 1}} c (E_{i}) \leq (i = 1 \sum ℓ \frac{4 τ α _{i}}{w + β α _{i}}) .

(i = 1 \sum ℓ \frac{4 τ α _{i}}{w + β α _{i}}) \leq ℓ \frac{4 τ}{β} \leq \frac{τ}{2} .

(i = 1 \sum ℓ \frac{4 τ α _{i}}{w + β α _{i}}) \leq ℓ \frac{4 τ}{β} \leq \frac{τ}{2} .

f (i = 0 ⋃ ℓ (A_{i} ∖ E_{i})) \geq f (B_{ℓ, r}) - \frac{τ}{2} .

f (i = 0 ⋃ ℓ (A_{i} ∖ E_{i})) \geq f (B_{ℓ, r}) - \frac{τ}{2} .

f (Z_{τ}) \geq (1 - \frac{1}{e}) (f (O P T) - f (B_{ℓ, r}) - τ),

f (Z_{τ}) \geq (1 - \frac{1}{e}) (f (O P T) - f (B_{ℓ, r}) - τ),

f (e ∣ B_{ℓ, r}) < \frac{τ}{K} \cdot c (e)

f (e ∣ B_{ℓ, r}) < \frac{τ}{K} \cdot c (e)

f (Y) \geq f (O P T (K, V ∖ E)) - f (B_{ℓ, r}) - τ .

f (Y) \geq f (O P T (K, V ∖ E)) - f (B_{ℓ, r}) - τ .

f (Z) \geq (\frac{2 ( 1 - 1/ e ) ζ}{32 ζ + 3} - ϵ) f (O P T (V ∖ E)) .

f (Z) \geq (\frac{2 ( 1 - 1/ e ) ζ}{32 ζ + 3} - ϵ) f (O P T (V ∖ E)) .

f (Z_{τ}) \geq \frac{1}{1 + 2 d} (1 - \frac{1}{e}) (1 - \frac{1}{2 ℓ}) τ .

f (Z_{τ}) \geq \frac{1}{1 + 2 d} (1 - \frac{1}{e}) (1 - \frac{1}{2 ℓ}) τ .

I = c_{a} (j = 1 ⋃ n_{i^{*}} (B_{i^{*}, j} \cap E)) .

I = c_{a} (j = 1 ⋃ n_{i^{*}} (B_{i^{*}, j} \cap E)) .

c_{a} (B_{i^{*}, j^{*}} \cap E) \leq \frac{2 ^{i^{*}} \cdot I}{2 ^{i^{*}} \cdot w ⌈ K / 2 ^{i^{*}} ⌉ + ( 4 ℓ ) I} .

c_{a} (B_{i^{*}, j^{*}} \cap E) \leq \frac{2 ^{i^{*}} \cdot I}{2 ^{i^{*}} \cdot w ⌈ K / 2 ^{i^{*}} ⌉ + ( 4 ℓ ) I} .

f (E_{i} ∣ A_{i - 1}) < \frac{τ}{2 ^{i - 1} ( 1 + 2 d )} c_{a} (E_{i}) .

f (E_{i} ∣ A_{i - 1}) < \frac{τ}{2 ^{i - 1} ( 1 + 2 d )} c_{a} (E_{i}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Adversarial Robustness in Machine Learning · Cryptography and Data Security

Full text

Adversarially Robust Submodular Maximization under Knapsack Constraints

Dmitrii Avdiukhin Indiana University. E-mail: [email protected]

Slobodan Mitrović MIT. E-mail: [email protected]

Grigory Yaroslavtsev Indiana University & The Alan Turing Institute. E-mail: [email protected]

Samson Zhou Indiana University. E-mail: [email protected]

Abstract

We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution can be constructed. For multiple knapsack constraints, our approximation is within a constant-factor of the best known non-robust solution.

We evaluate the performance of our algorithms by comparison to natural robustifications of existing non-robust algorithms under two objectives: 1) dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), 2) movie recommendations on a dataset from MovieLens. Experimental results show that our algorithms give the best objective for a majority of the inputs and show strong performance even compared to offline algorithms that are given the set of removals in advance.

1 Introduction

Submodular maximization has a wide range of applications in data science, machine learning and optimization, including data summarization, personalized recommendation, feature selection and clustering under various constraints, e.g. budget, diversity, fairness and privacy among others. Constrained submodular optimization has been studied since the seminal work of [32]. It has recently attracted a lot of interest in various large-scale computation settings, including distributed [11, 28], streaming [3, 33, 1], and adaptive [15, 5, 4, 14, 13, 10] due to its applications in recommendation systems [24, 12], exemplar based clustering [16], and document summarization [26, 38, 35]. For monotone functions, constrainted submodular optimization has been studied extensively under numerous constraints such as cardinality [3, 6], knapsack [18], matchings [9], and matroids [8].

With the increase in volume of data, the task of designing low-memory streaming and low-communication distributed algorithms for monotone submodular maximization has received significant attention. A series of results [3, 33, 1] culminated in a single pass algorithm over random-order stream that achieves close to $(1-1/e)$ -approximation of monotone submodular maximization under cardinality constraint. This approximation almost matches the guarantee of a celebrated result [32]. Also in the context of streaming, [23, 18, 39] studied monotone submodular maximization under $d$ knapsack constraints, resulting in a single pass algorithm that provides $(1/(1+2d))$ -approximation. Another line of work [30, 23, 11, 28] focused on submodular maximization in distributed setting. In particular, [28] developed $2$ -round and $(1/\epsilon)$ -round MapReduce algorithms that provide $1/2$ and $1-1/e-\epsilon$ approximation, respectively, for monotone submodular maximization under cardinality constraint.

In this paper, we focus on the robust version of this classic problem [21, 34, 31, 20]. Consider a situation where a set of recommendations (or advertisements) is constructed for a new user. It is standard to model this as a monotone submodular maximization problem under knapsack constraints, which allow incorporation of various restrictions on available budget, screen space, user preferences, privacy and fairness, etc. However, new users are likely to find some of the recommended items familiar, annoying or otherwise undesirable. Hence, it is advisable to build recommendations in such a way that even if the user later decides to dismiss some of the recommended items, one can quickly compute a new high-quality set of recommended items without solving the entire problem from scratch. We refer to this property as “adversarial robustness” since the removals are allowed to be completely arbitrary (e.g. might depend on the algorithm’s suggestions).

1.1 Adversarially Robust Monotone Submodular Maximization

Let $V$ be a finite domain consisting of elements $e_{1},\dots,e_{|V|}$ . For a set function $f\colon 2^{V}\to\mathbb{R}^{\geq 0}$ , we use $f\left(e\,\middle|\,S\right)$ to denote the marginal gain of an element $e$ given a set $S\subseteq V$ , i.e., $f\left(e\,\middle|\,S\right)=f(S\cup\{e\})-f(S)$ . A set function $f$ is submodular if for every $S\subseteq T\subseteq V$ and every $e\in V$ it holds that $f\left(e\,\middle|\,T\right)\leq f\left(e\,\middle|\,S\right)$ . A set function $f$ is monotone if for every $S\subseteq T\subseteq V$ it holds that $f(T)\geq f(S)$ . Intuitively, elements in the universe contribute non-negative utility, but have diminishing gains as the cost of the set increases.

For a set $S\subseteq V$ we use notation ${\textbf{x}}_{S}$ to denote the 0-1 indicator vector of $S$ . We use ${\textbf{C}}\in\mathbb{R}^{d\times|V|}$ to denote a matrix with positive entries and ${\textbf{b}}\in\mathbb{R}^{d}$ to denote a vector with positive entries. Here, C and $b$ should be interpreted as knapsack constraints, where set $S$ satisfies these constraints if and only if ${\textbf{C}}{\textbf{x}}_{S}\leq b$ .

Problem 1.1 (MSM under knapsack constraints)

In the monotone submodular maximization (MSM) problem subject to $d$ knapsack constraints, we are given a monotone submodular set function $f\colon 2^{V}\to\mathbb{R}^{\geq 0}$ and are required to output:

[TABLE]

Since the constraints are scaling-invariant, one can rescale each row ${\textbf{C}}_{i}$ by multiplying it (and the corresponding entry in b) by $b_{1}/b_{i}$ so that all entries in b are the same and equal to $b_{1}$ . One can further rescale C and b by the smallest entry in C (or some lower bound on it), so that $\min_{i,j}C_{i,j}\geq 1$ . We assume such rescaling below and let $K=b_{i}$ for all $i$ . In the case of one constraint ( $d=1$ ), we further simplify the notation and set $c(e_{i})=C_{1,i}$ and $K=b_{1}$ and refer to $c(e_{i})$ simply as the cost of the $i$ -th item.

An important role in our algorithms is played by the marginal density of an item. Formally, for a set $S\subseteq V$ , an element $e$ and a cost function $c\colon V\to\mathbb{R}^{\geq 0}$ we define the marginal density of $e$ with respect to $S$ under the cost function $c$ as: $\rho(e|S)=\frac{f\left(e\,\middle|\,S\right)}{c(e)}$ . For multiple dimensions, we will specifically define the cost function $c(\cdot)$ .

Motivated by applications to personalized recommendation systems, we consider the adversarially robust version of the above problem. In the adversarially robust monotone submodular maximization (ARMSM) problem the goal is to produce a small “adversarially robust” summary $\mathcal{S}\subseteq V$ . Here “adversarial robustness” means that for any set $E$ of cardinality at most $m$ , which might be later removed, one should be able to compute a good approximation for the residual monotone submodular maximization problem over $V\setminus E$ based only on $\mathcal{S}$ . In this paper, we propose a study of ARMSM under knapsack constraints:

Problem 1.2 (ARMSM under knapsack constraints)

An algorithm $\mathcal{A}$ solves the adversarially robust monotone submodular maximization problem ARMSM $(m,K)$ subject to $d$ knapsack constraints if it produces a summary $\mathcal{S}\subseteq V$ such that:

[TABLE]

for any set of removals $E$ of cardinality at most $m$ . $\mathcal{A}$ gives an $\alpha$ -approximation if there exists a set $Z\subseteq\mathcal{S}$ with ${\textbf{C}}{\textbf{x}}_{Z}\leq{\textbf{b}}$ such that $f(Z)\geq\alpha f(\mathcal{OPT}(V\setminus E))$ .

The main goal of an adversarially robust algorithm is to minimize the size of the resulting summary. We remark that the above robustness model is very strong. In particular, the set of removals $E$ does not have to be fixed in advance and might depend on the summary $\mathcal{S}$ produced by the algorithm. Hence, we choose to refer to it as adversarial robustness in order to avoid confusion with other notions of robustness known in the literature [2, 36].

1.2 Our Theoretical Results

Streaming algorithms.

We first consider the ARMSM problem in the streaming setting. A streaming algorithm is given the vector b of knapsack budget bounds upfront. Then, the elements of the ground set $e_{1},\dots,e_{|V|}$ arrive in an arbitrary order. When an element $e_{i}$ arrives, the algorithm sees the corresponding column ${\textbf{C}}_{*,i}$ , which lists the $d$ costs associated with this item. The algorithm only sees each element once and is required to use only a small amount of space throughout the stream. In the end of the stream, an adversarially chosen set of removals $E$ is revealed and the goal is to solve ARMSM over $V\setminus E$ . The key objective of the streaming algorithm is to minimize the amount of space used while providing a good approximation for ARMSM for any $E$ .

Our first set of results gives adversarially robust algorithms for the ARMSM problem under one knapsack constraint:

Theorem 1.3 (ARMSM under one knapsack constraint)

For the ARMSM $(m,K)$ problem under one knapsack constraint, there exists an algorithm that gives a constant-factor approximation with a summary consisting of $\tilde{O}(K+m)$ elements of the ground set (Theorem 3.1).

We also show that if the total cost of removed items is at most $M$ then there is an algorithm with summary size $\tilde{O}(K+M)$ and improved approximation guarantee. For ARMSM under a single knapsack constraint, our bounds are tight up to polylogarithmic factors, since an optimal solution may contain $K$ items of unit cost, and an adversary can remove up to $m$ items of any set. Hence, storing $\Omega(K+m)$ elements is necessary to obtain a constant factor approximation.

For the ARMSM problem under $d$ knapsack constraints, we give an algorithm with the following guarantee:

Theorem 1.4 (ARMSM under $d$ knapsack constraints)

For the ARMSM $(m,K)$ problem under $d$ knapsack constraints, there exists an algorithm that gives an $\Omega(\frac{1}{d})$ -approximation with a summary of size $\tilde{O}(K+m)$ (Theorem 3.2).

Distributed algorithms.

We also consider the ARMSM problem in the distributed setting. Here, our aim is to collect a robust set $\mathcal{S}$ of elements while distributing the work to a number of machines, minimizing the memory requirement per machine and the number of rounds in which the machines need to communicate with each other. As in the case of streaming setting, a set of removals $E$ is revealed only after $\mathcal{S}$ is constructed. We obtain a $2$ -round algorithm that matches our result for streaming, in terms of approximation guarantees.

Theorem 1.5 (Distributed ARMSM)

For the ARMSM $(m,K)$ problem on a dataset of size $n$ under $d$ knapsack constraints, there exists an algorithm that gives an $\Omega(\frac{1}{d})$ -approximation with a summary of size $\tilde{O}(K+m)$ . If oracle access to $f$ is given, this algorithm can be implemented in two distributed rounds using $\tilde{O}((m+K)\sqrt{n})$ words of space per machine (Theorem 4.1).

1.3 Empirical Evaluations

We evaluate the performance of our algorithms on both single knapsack and multiple knapsack constraints by comparison to natural generalizations of existing algorithms. We implement the algorithms for the objective of dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), and for the objective of coverage on a large dataset from MovieLens. We compare the objectives on the sets output as well as the total number of elements collected by each algorithm.

Our results show that our algorithms provide the best objective for a majority of the inputs. In fact, our streaming algorithms perform just as well as the standard offline algorithms, even when the offline algorithms know in advance which elements will be removed. Our results also indicate that the number of elements collected by our algorithms does not appear to correlate with the total number of elements, which is an attractive property for streaming algorithms. In fact, most of the baseline algorithms collect relatively the same number of elements for the robust summary, ensuring fair comparison. For more details, see Section 5.

1.4 Previous Work

The special case of ARMSM $(m,K)$ with one constraint and equal costs for all elements is referred to as robust submodular maximization under the cardinality constraint. If at most $k$ elements can be selected, we refer to this problem as ARMSM $(m,k)$ . The study of this problem was initiated by Krause et al. [21]. The first (non-streaming) constant-factor approximation for this problem was given by Orlin et al. [34] for $m=o(\sqrt{k})$ . This was further extended by [7] who give algorithms for $m=o(k)$ . In these works, the size of the summary is restricted to contain at most $k$ elements and hence by design only $m<k$ removals can be handled.

Recently the focus has shifted to handling larger numbers of removals and so there has been increased interest in studying ARMSM $(m,k)$ with summary of sizes greater than $k$ . [29] solve this problem with summary size $O(k\cdot m)$ , which was improved by [31] to $\tilde{O}(m+k)$ . Moreover, their algorithms are applicable to arbitrary ordered streams. A different setup was considered by [20], who assume that $E$ is chosen independently of the choice of a robust summary and give algorithms with summary size $\tilde{O}(m+k)$ , but obtain better approximation guarantees than [31].

To the best of our knowledge, there is little known about the general ARMSM $(m,K)$ problem considered here which asks for robustness under single or multiple knapsack constraints.

2 Techniques

Our general approach is to find a set $\mathcal{S}$ at the end of the stream, so that when a set $E$ of items is removed, we show that running an offline algorithm, Offline, on the set $Z:=\mathcal{S}\setminus E$ produces a good approximation to the value of the optimal solution of the entire stream. Since Offline on input $Z$ is known to produce a good approximation to the optimal solution of constrained submodular maximization on input $Z$ (see Theorem 2.1), then it suffices to show that $f(\mathcal{OPT}(Z))$ is a good approximation to $f(\mathcal{OPT})$ , where we use $\mathcal{OPT}$ to denote $\mathcal{OPT}(V\setminus E)$ .

Theorem 2.1

[37, 22]** There exists an algorithm Offline that gives a $(1-1/e)$ -approximation for the monotone submodular maximization problem subject to $d$ knapsack constraints in polynomial time.

We assume that we have a good guess for $f(\mathcal{OPT})$ by making a number of exponentially increasing guesses $\tau$ . Our algorithms start with the partitions-and-buckets approach from [7, 31] for robust submodular maximization under cardinality constraints. Specifically, our algorithms create a number of partitions and also create a number of buckets for each partition, where the number of buckets is chosen to be “robust” to the removal of items at the end of the stream. An element in the stream is added to the first possible bucket in which its marginal density exceeds a certain threshold, which is governed by the partition. The thresholds are exponentially decreasing across the partitions, so that the number of partitions is logarithmic in $K$ .

At a high level, our algorithms overcome several potential pitfalls. The first challenge we face is the issue of buckets being populated with items of small cost whose marginal density surpasses the threshold. These small items prevent large items (such as cost $K$ ) whose marginal density also surpasses the threshold from being added to any bucket. If the optimal solution consists of a single large item, then the approximation guarantee could potentially be as bad as $\frac{1}{K}$ . Thus, we allow each bucket double the capacity and create an additional partition level with a smaller threshold to compensate.

The second challenge we face is relating the items in various partitions. Although we would like to argue that an item $e$ in a bucket in a certain partition $i$ does not have overwhelmingly large marginal gain, the most natural way to prove this would be to claim that $e$ would have been placed in a previous partition less than $i$ because the ratio is overwhelmingly large. However, this is no longer true because items in partition $i$ can have up to cost $2^{i}$ and any non-empty bucket in previous partitions does not have enough capacity. Surprisingly, for the purposes of analysis, it suffices to prohibit any item in a bucket from using more than a certain fraction of the capacity. That is, any item added to a bucket $B_{i,j}$ , which has capacity $2^{i+1}$ , must have cost at most $2^{i-1}$ .

2.1 Robustness to the Removal of $m$ Items

We now describe AlgNum, which outputs a solution of cost $K$ on a single knapsack constraint and is robust against the removal of up to $m$ items. We would like to use an averaging arguments to show that some “saturated” bucket $B_{i^{*},j}$ in a partition cannot have too much intersection with the elements $E$ that are removed at the end of the stream. However, the removal of up to $m$ items at the end of the stream may cause the removal of cost up to $mK$ . But then the averaging argument fails unless the number of buckets in each partition also increases by a factor of $K$ , which unfortunately gives an additional multiple of $K$ in the space of the algorithm.

Instead, the key idea is to dynamically allocate a number of new buckets, depending on the total cost of the current items in the buckets of a partition. The goal is to maintain enough buckets to guarantee that a certain number of elements can be added to a partition, regardless of their cost. Therefore, the number of total buckets is not large unless the stored items have large cost, in which case the number of items is relatively low anyway. To do this, we maintain counters $s_{i}$ that allocate a new bucket to partition $i$ each time they exceed $\min\{2^{i},K\}$ . Each time an item $e$ is added to partition $i$ , the counter $s_{i}$ is increased proportional to the cost of the item, $c(e)$ . The creation of new buckets is allowed until a certain number of items have been collected by the partition. Intuitively, algorithms robust to the removal of $m$ items, such as AlgNum, should strive to output at the end of the stream a set with a certain number of items, whereas algorithms robust to the removal of items with a certain cost $M$ should strive to output a set with a certain cost.

At the end, we run a procedure Prune to further bound the number of elements output by the algorithm. Prune simply reorders the elements stored by AlgNum by cost of the elements, and again runs AlgNum on the sorted set of elements as an input stream. Since the items with smaller cost arrive first, this ensures that we cannot have too many items of large cost.

3 Streaming Algorithms

We now warm-up by providing the first streaming algorithm for the ARMSM $(m,K)$ problem under a single knapsack constraint. We later show how to build on these ideas to obtain robustness subject to multiple knapsack constraints.

3.1 Single Knapsack Constraint

We describe our algorithm AlgNum, which is used to produce a summary consisting of $\tilde{O}(K+m)$ items. Recall that we use $\mathcal{OPT}$ to denote $\mathcal{OPT}(V\setminus E)$ . In order to simplify presentation, we assume111This assumption can be removed using standard techniques (see e.g. Appendix E of [31]) by maintaining $\frac{1}{\epsilon}\log K$ guesses to find such a $\tau^{*}$ . that we have a good estimate $\tau^{*}$ for $f(\mathcal{OPT})$ , such that $\tau^{*}\leq f(\mathcal{OPT})\leq(1+\epsilon)\tau^{*}$ . To simplify presentation, we further assume that $K$ is a power of two and hence let $K=2^{\ell}$ (see Algorithm 1 for how rounding is handled).

AlgNum creates $\ell$ partitions $B_{1},\dots,B_{\ell}$ where the $i$ -th partition initially consists of $n_{i}=O(\ell(\frac{m}{2^{i}}+1))$ buckets of capacity $2^{i+1}$ each. We refer to the $j$ -th bucket in the $i$ -th partition as $B_{i,j}$ . When processing the stream, each element $e$ is added to the first possible bucket $B_{i,j}$ in the first possible partition $i$ such that the bucket has enough capacity remaining and the marginal density $\rho(e|B_{i,j})$ exceeds a certain threshold $\tau/2^{i}$ for this partition. Note that the thresholds exponentially decrease across the partitions while capacities of the buckets exponentially increase.

Our goal is to maintain enough buckets to guarantee that a certain number of elements can be added to a partition, regardless of their cost. To dynamically allocate a number of new buckets, AlgNum keeps counters $s_{i}$ that create a new bucket while they exceed $2^{i}$ , after which the value of the counter is lowered. The counter $s_{i}$ is increased proportional to the cost of an item $c(e)$ each time an item $e$ is added to partition $i$ . This process continues until a certain number of items are in the partition. Finally, we run the procedure Prune to further bound the number of elements output by the algorithm. See Figure 1 for an illustration of the data structure.

By using Prune on the output of AlgNum, we have the following result, whose proof formally appears in Appendix A.

Theorem 3.1

*There exists an algorithm that outputs a set $\mathcal{S}$ with $\tilde{O}(K+m)$ elements such that, for any set $E$ of at most $m$ removed items, one can compute from $\mathcal{S}$ a set $Z\subseteq V\setminus E$ with cost at most $K$ and $f(Z)$ is a constant factor approximation to $f(\mathcal{OPT})$ . *

In fact, if the $m$ items that are removed has total cost at most $M$ , we can provide a better guarantee in terms of both approximation and number of elements stored (see Appendix D).

3.2 $d$ Knapsack Constraints

We now consider the ARMSM $(m,K)$ problem under $d$ knapsack constraints. Recall that AlgNum relies on guessing the correct threshold and then using a streaming framework that adds elements whose marginal gain surpasses the threshold. In the case where there are $d$ knapsack constraints, a natural approach would be to have parallel instances that guess thresholds for each constraint, and then pick the instance with the best set. This would certainly work, but since there would be $O\left(\log K\right)$ guesses for each constraint, the total number of parallel instances would be $O\left(\log^{d}K\right)$ , which is unacceptable for large values of $d$ and $K$ . On the other hand, it seems reasonable to believe that the space usage can be improved, at the expense of the approximation guarantee, by maintaining a smaller number of parallel instances. In that case, marginal gain to cost ratio is not well-defined, since there is a separate cost for each knapsack, so what would be the right quantity to consider?

Recall the standard normalization for multiple knapsack constraints discussed in Section 1.1. We define the largest cost of an item to be the maximum cost of the item across all knapsacks, after the normalization. It has been previously shown that the correct quantity to consider for the streaming model is the marginal gain of an item divided by its largest cost [39]. Namely, if the ratio of the marginal gain to the largest cost of an item exceeds the corresponding threshold, and the item fits into a bucket without violating any of the knapsack constraints, then we choose to add the item to the first such bucket. Since the threshold now compares the marginal gain to the largest cost, a natural question to be asked is what quantity should be used for the dynamic allocation of the buckets. Recall that the previous goal of AlgNum was to maintain a specific number of items, so that it would be robust against the removal of $m$ items. Thus, we would like to allocate a new bucket for a partition whenever the capacity of the bucket with respect to some knapsack becomes saturated. Hence, AlgMult maintains a series of counters i,a for partition $i$ and knapsack $a$ , where $1\leq a\leq d$ . Whenever one of these counters exceeds $K$ , we create a new bucket entirely in partition $i$ , and lower i,a accordingly.

By using Prune on the output of AlgNum, we have the following result, whose proof formally appears in Appendix B. As in Section 3.1, we do not attempt to optimize parameters here, but observe that the number of elements stored is independent of $d$ .

Theorem 3.2

*For the ARMSM $(m,K)$ problem under $d$ knapsack constraints, there exists an algorithm that outputs a set $\mathcal{S}$ of size $\tilde{O}(K+m)$ , from which one can compute a set $Z\subseteq V\setminus E$ with cost at most $K$ and $f(Z)$ is a $\Omega\left(\frac{1}{d}\right)$ -approximation to $f(\mathcal{OPT})$ . *

4 Distributed Algorithm

In this section, we give a distributed algorithm for the ARMSM $(m,K)$ problem under $d$ knapsack constraints (see Definition 1.2). We use a variant of the MapReduce model of [19], in which we consider an input set $V$ of size $n=|V|$ that is distributed across $\tilde{O}((K+m)\sqrt{n})$ machines. For some parameters $m$ and $K$ that are known across all machines, we permit each machine to have $\tilde{O}((K+m)\sqrt{n})$ memory. The machines communicate to each other in a number of synchronous rounds to perform computation. In each round, each machine receives some input of size $\tilde{O}((K+m)\sqrt{n})$ , on which the machine performs some local computation. The machine then communicates some output to other machines at the start of the next round. We require that the total input and output message size is $\tilde{O}((K+m)\sqrt{n})$ per machine. We assume that each machine has access to an oracle that computes $f$ . Then our main result in the distributed model is the following.

Theorem 4.1

For the ARMSM $(m,K)$ problem under $d$ knapsack constraints, there exists a two-round distributed algorithm that outputs a set $\mathcal{S}$ , from which one can compute a set $Z\subseteq V\setminus E$ with cost at most $K$ and $f(Z)$ is a $\Omega\left(\frac{1}{d}\right)$ -factor approximation to $f(\mathcal{OPT})$ . Moreover, each machine uses space $\tilde{O}\left(\sqrt{n\cdot\frac{1}{\epsilon}(K^{2}+mK)}\right)$ .

The analysis of our distributed algorithm is based on the analysis for our streaming algorithms, along with a recent work by [28]. We generalize their result to obtain a distributed algorithm that constructs a robust summary equivalent to that constructed by AlgMult.

In our algorithm and proofs, we use $L$ to denote an upper bound on the number of elements collected by AlgMult. Let $\mathcal{B}$ be the data structure of sets $B_{i,j}$ maintained by AlgMult. We use ${\textsc{AlgMult}}_{\mathcal{B},W}$ to refer to the invocation of AlgMult with the following changes:

•

The buckets $B_{i,j}$ are initialized by $\mathcal{B}$ and the loop on line 6 of AlgMult is ignored.

•

In place of $V$ , the ground set $W$ is used.

Our distributed algorithm is explicitly given in Algorithm 6 and uses subroutine PartitionAndSample, which is given in Algorithm 5.

We formally prove Theorem 4.1 in Appendix C by first showing that the approximation guarantee is the same as Theorem 3.2.

Lemma 4.2

There exists a distributed algorithm that outputs a set $Z$ so that $f(Z)$ has the same approximation guarantee as stated by Theorem 3.2.

We can also bound the total number of elements sent to the central machine, using a proof similar to [28].

Lemma 4.3

Let $L$ be an upper bound on the number of elements collected by AlgMult. With probability $1-e^{-\Omega(L)}$ , the number of elements sent to the central machine $C$ is at most $\sqrt{nL}$ .

5 Experiments

In this section, we provide empirical evaluation of our algorithms for ARMSM under both single knapsack and multiple knapsack constraints. As no prior work exists in this setting we use the most natural generalizations of standard non-robust algorithms for comparison. We test our most general algorithm AlgMult against such algorithms while measuring the number of elements collected and the quality of the resulting approximation. The aim of our evaluations is to address the following points:

How does AlgMult compare to “robustified” generalizations of other submodular maximization algorithms? 2. 2.

How well does AlgMult perform on real datasets compared to our theoretical worst-case guarantees? 3. 3.

How many elements does AlgMult collect? 4. 4.

Does the performance of AlgMult degrade as the number of elements $m$ removed at the end of the stream increases?

Implementation is available at https://github.com/KDD2019SubmodularKnapsack/KDD2019SubmodularKnapsack.

Robustification.

Although there are no existing ARMSM algorithms for knapsack constraints, we propose the following modification to existing algorithms to ensure a fair comparison. Given a submodular maximization algorithm $\mathcal{A}$ , we consider its robustified version by allowing the algorithm to collect extra elements to obtain its own robust summary. To achieve this, we increase the knapsack capacity by some multiplicative factor, which is selected in such way that all algorithms collect approximately the same number of elements.

5.1 Baselines

We compare AlgMult to the following algorithms.

Robustified MarginalRatio.

This algorithm corresponds to a robustified version of Algorithm $2$ from [18], which accepts any element whose marginal density with respect to the stored elements exceeds a certain threshold. Note that while the algorithm is for a single knapsack constraint, it can be trivially extended to multiple knapsack constraints by checking that the thresholding condition holds for all dimensions. This marginal density thresholding algorithm is a natural generalization to knapsack constraints of the streaming algorithm Sieve [3] which gives the best theoretical guarantee under the cardinality constraint.

Robustified offline Greedy.

This algorithm builds its summary by iteratively adding to it an element with the largest marginal density. Observe that Greedy is an offline algorithm, which is a more powerful model. However, Greedy is a single knapsack algorithm, so we use it only as a baseline for single knapsack constraints. While there exists a Greedy algorithm [27] under multiple knapsack constraints, it requires $O\left(n^{5}\right)$ running time, which makes it infeasible on large datasets.

Robustified Multidimensional.

This is a robustified version of the streaming algorithm for submodular maximization with multiple knapsack constraints from [39].

5.2 Objectives and Datasets

We evaluate the algorithms on two submodular objective functions:

Dominating set.

We use graphs ego-Facebook (4K vertices, 81K edges) and ego-Twitter (88K vertices, 1.8M edges) from the SNAP database [25]. For a graph $G(V,E)$ and $Z\subseteq V$ , we let $f(Z)=\frac{|Z\cup N(Z)|}{|V|}$ , where $N(Z)$ is the set of all neighbors of $Z$ . For each knapsack constraint, the cost of each element is selected uniformly at random from the uniform distribution $\mathcal{U}(1,3)$ and all knapsack constraints are set to $10$ .

Movie recommendation.

Modeling the scenario of movie recommendations we analyze a dataset of movie ratings (in the range $[1,5]$ ) assigned by users. For each movie $x$ we use a vector $v_{x}$ of normalized ratings: if user $u$ did not rate movie $x$ , then set $v_{x,u}=0$ , otherwise set $v_{x,u}=r_{x,u}-r_{avg}$ , where $r_{avg}$ denotes the average of all known ratings. Then, the similarity between two movies $x_{1}$ and $x_{2}$ can be defined as the dot product $\langle v_{x_{1}},v_{x_{2}}\rangle$ of their vectors.

In the case of movie recommendation the goal is to select a representative subset of movies. The domain of our objective is the set of all movies. For a subset of movies $X$ we consider a parameterized objective function $f_{X}$ :

[TABLE]

where $Z$ is a subset of movies. This captures how representative is $Z$ of the set $X$ . In our experiments, we model the situation of making recommendations to some user so we pick $X$ to be a set of movies rated by the user (we select the user uniformly at random). Hence the maximizer of $f_{X}(Z)$ corresponds to a subset of movies which represents well user’s rated set $X$ .

We use the ml-20 MovieLens dataset [17], containing $27\,278$ movies and $20\,000\,263$ ratings. Knapsack constraints model limited demand for movies of a certain type (e.g. not too many action movies, not too many fantasy movies, etc). In the data each movie is labeled by several genres and each knapsack constraint is described by sets of “good” and “bad” genres. Movies with more “good” genres and less “bad” genres have lower cost, allowing the algorithm to choose more such movies. If there are at most $t$ genres describing “good” and “bad” sets then we set the cost of a movie $x$ to be linear in the range $[1,t+1]$ :

[TABLE]

where $good(x)$ and $bad(x)$ are the numbers of good and bad genres that movie $x$ is labeled with.

For the experiments under one knapsack constraint, we define the good set of movies as $good={\left\{\text{Comedy},\ \text{Horror}\right\}}$ , and the bad set of movies as $bad={\left\{\text{Adventure},\ \text{Action}\right\}}$ . For experiments under two knapsack constraints, we define the second constraint by an additional set of good movies $good={\left\{\text{Drama},\ \text{Romance}\right\}}$ , and an additional set of bad movies $bad={\left\{\text{Sci-Fi},\ \text{Fantasy}\right\}}$ . All knapsack constraint bounds are set to $10$ , limiting the total number of recommended movies.

5.3 Experimental Evaluation and Results

We compare AlgMult against the three baselines described in Section 5.1. First, we obtain robust summaries for AlgMult and for each of the baselines. Second, we adversarially remove elements from these summaries. Finally, we run Offline on the remaining elements in the summaries and compare the values of objective functions on the resulting sets.

Adversarial removals.

To ensure a fair comparison, we use the same set of removed elements for all algorithms. This is done by removing the union of sets recommended by all algorithms and then continuing in a recursive fashion if more removals are required.

We define the removal process formally as follows. For an algorithm $\mathcal{A}$ , let $S_{\mathcal{A}}$ be the robust summary output by $\mathcal{A}$ . We let $R_{1}=\cup_{\mathcal{A}}{\textsc{Offline}}{(S_{\mathcal{A}})}$ , where the union is taken over all four algorithms $\mathcal{A}$ tested. That is, $R_{1}$ is the union of the best elements selected using AlgMult, Greedy, Multidimensional, and MarginalRatio. This typically already gives a good choice of removals. If more removals are required, we define $R_{k+1}=\cup_{\mathcal{A}}{\textsc{Offline}}(S_{\mathcal{A}}\setminus\cup_{i=1}^{k}R_{i})$ . That is, we recursively remove the union of the elements in the optimal sets across all the algorithms and we repeat this process until $R_{k}$ is empty.

Evaluation.

For different numbers of removed elements, we compare the values that are produced by the offline algorithm on robust summaries, i.e. ${\textsc{Offline}}{(S_{\mathcal{A}}\setminus\bigcup_{i=1}^{k}{R_{i}})}$ generated by the four algorithms. Since $f(\mathcal{OPT})$ is NP-hard to compute, we compare the performance of each algorithm with upper bounds on $f(\mathcal{OPT})$ to estimate the approximation given by the algorithms. For a single knapsack constraint, the best known upper bound can be computed from Greedy and for multiple knapsack constraints from Multidimensional [39].

Results.

The results of our experiments are shown in Figure 2. For each algorithm, we plot the ratio of its objective to an upper bound on the optimal solution, which is obtained as previously discussed. Figures 2a, 2b, and 2c show experimental results for $1$ -knapsack constraints using Greedy as the offline algorithm with approximation factor $1-e^{-c(s)/K}$ , where $K$ is the knapsack constraint and $c(S)$ is the cost of the resulting set. For many instances, $c(S)$ is close to $K$ , so this value is close to $1-\frac{1}{e}\approx 0.63$ . Figures 2d, 2e, and 2f show experimental results for $2$ -knapsack constraints.

Our evaluations suggest that AlgMult provides the best possible approximation factor for a majority of inputs. Except for the first iterations in Figure 2d, AlgMult outperforms the other algorithms, and achieves roughly the same approximation guarantee as the offline algorithm that knows the items to be removed in advance. In fact, the advantage of AlgMult becomes more noticeable as larger numbers of elements are removed.

Since the baseline algorithms, other than Greedy, require an estimate of $f(\mathcal{OPT})$ , we try several such estimations. The non-monotone behavior of the ratio of MarginalRatio to $f(\mathcal{OPT})$ in Figure 2f occurs since MarginalRatio performs better when estimation is close to the true objective. We emphasize the fact that all algorithms, including AlgMult, use the same $f(\mathcal{OPT})$ estimations. It is possible to obtain a more monotone behavior by trying more estimations, but doing so will require collecting more elements.

To evaluate memory consumption, we also report the number of elements collected by each algorithm. These results are presented in Table 1 and show that the algorithms for $2$ -knapsack constraints collect noticeably more elements than those performing maximization under a single knapsack constraint. The size of the robust summary output by AlgMult does not appear to correlate with the total number of elements, and in the case of ego-Twitter, it collects only $5\%$ of the vertices.

Recall that we allow the baseline algorithms to collect extra elements by increasing their knapsack capacity to ensure fair comparison. Hence, almost all the algorithms collect similar numbers of elements for each setup, as shown in Table 1. Note that, however, for some experimental setups Multidimensional collects significantly fewer elements than the other algorithms. This phenomenon persists even if the knapsack capacity is unbounded.

In our empirical evaluations, the number of collected elements did not seem to depend on the number of removed items $m$ . One possible reason for this phenomena is that the algorithms were not executed with small guesses for the optimal objective. As a result when the number of removed elements is large, the optimal objective is below the threshold considered by the algorithm, and therefore more elements are not collected because the threshold is set to be too high. However, it is natural that with sufficiently bad guesses for the optimal objective, any thresholding algorithm will be forced to meaninglessly collect a large number of elements.

6 Conclusion

We have given the first streaming and distributed algorithms for adversarially robust monotone submodular maximization subject to single and multiple knapsack constraints. Our algorithms are based on a novel data structure which dynamically allocates new space depending on the elements stored so far and perform well on large scale data sets, even compared to offline algorithms that know in advance which elements will be removed.

For the future work, it is natural to ask whether our framework can be scaled to larger datasets for some specific classes of objectives, e.g., is it possible to ensure adversarial robustness with sketching methods for coverage objectives [6]? It would be also interesting to understand the limits on approximation that can be achieved with adversarial robustness and summary size only $\tilde{O}(K+m)$ . Finally, an interesting open question is whether it is possible to do adversarially robust non-monotone submodular maximization.

Appendix A Missing Proofs from Section 3.1

We call a bucket saturated if the cost of its items is at least half of its capacity.

Definition A.1

A bucket $B_{i,j}$ , is saturated if $c(B_{i,j})\geq 2^{i}$ .

We break the analysis into the following three cases:

At least half of the buckets in some partition are saturated (Lemma A.2) 2. 2.

More than half of the buckets in all partitions are not saturated, but there exists some bucket in the last partition $B_{\ell}$ which is a good estimate of $f(\mathcal{OPT})$ (Lemma A.7) 3. 3.

More than half of the buckets in all partitions are not saturated and no bucket in the last partition $B_{\ell}$ is a good estimate of $f(\mathcal{OPT})$ (Lemma A.8)

We first show that if at least half of the buckets in some partition are saturated, some saturated bucket in this partition cannot be affected too much by the removal of elements $E$ at the end of the stream. Hence, this saturated bucket contains a set of elements $Z$ such that $f(Z)$ is a good approximation to $\tau$ , which in turn is a good approximation to $f(\mathcal{OPT})$ .

Let $S_{\tau}:=\{B_{i,j}\}_{i,j}$ denote the data structure output by AlgNum when run with parameter $\tau$ (i.e. for $\tau^{*}=\frac{32(1-\frac{1}{2\ell})+3}{2}\tau$ ). Let $Z_{\tau}:={\textsc{Offline}}(\bigcup_{B_{i,j}\in S_{\tau}}B_{i,j}\setminus E)$ .

Lemma A.2

Let $\ell={\left\lceil\log K\right\rceil}$ and $\tau>0$ . If there exists a partition $B_{i^{*}}$ in $S_{\tau}$ with at least half of its buckets saturated, then for the set $Z_{\tau}$ it holds that:

[TABLE]

**Proof : ** Let $B_{i^{*}}$ be a partition in $S_{\tau}$ with at least half of its buckets saturated. Let $B_{i^{*},j^{*}}$ be a saturated bucket that minimizes the cost of removed items, i.e. $c(B_{i^{*},j}\cap E)$ , among all saturated buckets in this partition. Let $I$ be the cost all items in $E$ which are in partition $i^{*}$ , i.e. $I=c\left(E\cap\bigcup_{j=1}^{n_{i}}B_{i^{*},j}\right)$ . Then the total capacity of all buckets in partition $i^{*}$ is at least $2^{i^{*}+1}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+8\ell I$ . Thus, the total number of buckets in partition $i^{*}$ is at least $\frac{2^{i^{*}+1}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+8\ell I}{2^{i^{*}}}$ . Since at least half of its buckets are saturated the total number of the saturated buckets in partition $i^{*}$ is at least $\frac{2^{i^{*}}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+4\ell I}{2^{i*}}$ . By an averaging argument:

[TABLE]

Thus, $c(B_{i^{*},j}\setminus E)\geq c(B_{i^{*},j^{*}})-\frac{t_{i*}}{4\ell}$ . Since $B_{i^{*},j^{*}}$ is saturated by definition, then $2^{i^{*}-1}\leq c(B_{i^{*},j^{*}})\leq 2^{i^{*}}$ , and hence the marginal density of each element exceeds a threshold of $\frac{\tau}{2^{i^{*}}}\geq\frac{\tau}{c(B_{i^{*},j^{*}})}$ so that:

[TABLE]

Hence by Theorem 2.1, running Offline on $B_{i^{*},j^{*}}\setminus E$ gives value at least $\left(1-\frac{1}{e}\right)\left(1-\frac{1}{2\ell}\right)\tau$ . $\Box$

Before considering the other cases, we need the following technical lemmas. Lemma A.3 bounds the value of the removed elements, while Lemma A.4 allows us to relate the elements removed in a bucket $B_{\ell,r}$ of the last partition with the elements removed in previous partitions.

Lemma A.3

Given a bucket $A_{i-1}$ from partition $i-1$ that is not saturated, then the loss in bucket $A_{i}$ induced by the removals is at most

[TABLE]

where $E_{i}:=A_{i}\cap E$ denotes the elements that are removed from $A_{i}$ .

**Proof : ** By submodularity,

[TABLE]

For each $e\in E_{i}$ , either $\frac{f(e)}{c(e)}<\frac{\tau}{2^{i-1}}$ or $\frac{f(e)}{c(e)}>\frac{\tau}{2^{i-1}}$ . In the first case, $f\left(e\,\middle|\,A_{i-1}\right)\leq f(e)\leq\frac{\tau}{2^{i-1}}\cdot c(e)$ . In the second case, since $e\in A_{i}$ it must hold that $c(e)\leq 2^{i-1}$ . On the other hand, $e\notin A_{i-1}$ but $c(A_{i-1})<2^{i-1}$ because $A_{i-1}$ is not saturated. Thus, $f\left(e\,\middle|\,A_{i-1}\right)<\frac{\tau}{2^{i-1}}\cdot c(e)$ or else the algorithm would have added $e$ to $A_{i-1}$ . Hence, $f\left(e\,\middle|\,A_{i-1}\right)<\frac{\tau}{2^{i-1}}\cdot c(e)$ for all $e\in E_{i}$ and so by Equation 1, $f\left(E_{i}\,\middle|\,A_{i-1}\right)<\frac{\tau}{2^{i-1}}c(E_{i})$ . $\Box$

Lemma A.4

Suppose that there exists some bucket in every partition that is not saturated. Let $\ell={\left\lceil\log K\right\rceil}$ . For every partition $i$ , let $A_{i}$ denote a bucket with $c(A_{i})<2^{i}$ and let $E_{i}:=A_{i}\cap E$ denote the elements that are removed from $A_{i}$ . The loss in the bucket $B_{\ell,{r}}$ induced by the removals, given the remaining elements in the previous buckets, is at most

[TABLE]

**Proof : ** We show by induction that for any $i\geq 1$ the following holds

[TABLE]

so that the claim by setting by setting $i=\ell$ .

Base case $i=1$ .

Since $c(A_{0})<2^{0}=1$ and each item is normalized to have cost at least $1$ , it follows that both $A_{0}$ and $E_{0}$ are empty. Thus

[TABLE]

where the first inequality holds by Lemma A.3.

Inductive step $i>1$ .

Assuming Equation 2 holds for $i-1$ where $i>1$ , we now show that it also holds for $i$ . By submodularity, $f\left(E_{i-1}\,\middle|\,\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)\geq f\left(E_{i-1}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)$ . It follows that $f\left(E_{i}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)\leq f\left(E_{i}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)+f\left(E_{i-1}\,\middle|\,\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)-f\left(E_{i-1}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)$ , by adding $f\left(E_{i}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)$ to both sides. Since $E_{i}$ and $\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)$ are disjoint, then

[TABLE]

We can bound the first term $f\left(E_{i}\cup\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)$ by at most $f\left(E_{i}\cup A_{i-1}\cup\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)$ , by monotonicity. Note that the third term $f\left(E_{i-1}\cup\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)$ equals $f\left(E_{i-1}\cup A_{i-1}\cup\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)$ , which is at least $f\left(A_{i-1}\cup\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)$ , because $E_{i-1}\cup\left(A_{i-1}\setminus E_{i-1}\right)=E_{i-1}\cup A_{i-1}$ .

Hence, Equation A gives $f\left(E_{i}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)$ is at most the sum $f\left(E_{i}\,\middle|\,A_{i-1}\cup\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)+f\left(E_{i-1}\,\middle|\,\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)$ . Then by submodularity, $f\left(E_{i}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)$ is at most the sum $f\left(E_{i}\,\middle|\,A_{i-1}\right)+f\left(E_{i-1}\,\middle|\,\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)$ . Since $c(A_{i-1})<2^{i-1}$ , Lemma A.3 implies $f\left(E_{i}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)\leq\frac{\tau}{2^{i-1}}c(E_{i})+f\left(E_{i-1}\,\middle|\,\bigcup_{j=0}^{i-2}\left(A_{j}\setminus E_{j}\right)\right)$ . By the inductive hypothesis, $f\left(E_{i}\,\middle|\,\bigcup_{j=0}^{i-1}\left(A_{j}\setminus E_{j}\right)\right)\leq\frac{\tau}{2^{i-1}}c(E_{i})+\sum_{j=1}^{i-1}\frac{\tau}{2^{j-1}}c(E_{j})=\sum_{j=1}^{i}\frac{\tau}{2^{j-1}}c(E_{j})$ . $\Box$

We require the following structural lemma from [31].

Lemma A.5

[31]** For any monotone, non-negative submodular function $f$ on a ground set $V$ , and any sets $A,B,R\subseteq V$ , we have

[TABLE]

We also require the following structural lemma relating sets of large size to a large number of sets of small size.

Lemma A.6

Let $S$ be a set of size $\alpha K$ for some integer $\alpha\geq 1$ such that no item in $S$ has size more than $K$ . Then the items of $S$ can be partitioned into $2\alpha-1$ sets, each with size at most $K$ .

**Proof : ** Let $S_{1},\ldots,S_{i}$ be sets that partition $S$ with the minimal cardinality $i$ . Note that $c(S_{1})+c(S_{2})>K$ , or else the elements of set $S_{1}$ and $S_{2}$ can be combined into a single set, contradicting the definition of $i$ . Similarly, $c(S_{2j-1})+c(S_{2j})>K$ for each integer $j>0$ . On the other hand, $c(S)=\alpha K$ , so $j<\alpha$ and there can be at most $2\alpha-1$ sets. $\Box$

We can finally consider the second case of our analysis, where more than half of the buckets in all partitions are not saturated, but there exists some bucket $B_{\ell,{r}}$ in the last partition such that $f(B_{\ell,{r}})$ is a good estimate of $f(\mathcal{OPT})$ .

Lemma A.7

Let $\ell={\left\lceil\log K\right\rceil}$ and $\tau>0$ . If no partition in $S_{\tau}$ has at least half of its buckets saturated, then:

[TABLE]

where $B_{\ell,{r}}$ is any bucket in the last partition that is not saturated.

**Proof : ** Let $\ell={\left\lceil\log K\right\rceil}$ and let $A_{i}$ be the bucket in partition $i$ with $c(A_{i})<2^{i}$ that minimizes $c(A_{i}\cap E)$ . We denote $E_{i}:=A_{i}\cap E$ . By setting $B=B_{\ell,{r}}$ , $R=E_{\ell}$ and $A=\bigcup_{i=0}^{\ell-1}(A_{i}\setminus E_{i})$ in Lemma A.5, it follows that $f\left(\bigcup_{i=0}^{\ell}(A_{i}\setminus E_{i})\right)\geq f\left(B_{\ell,{r}}\right)-f\left(E_{\ell}\bigg{|}\bigcup_{i=0}^{\ell-1}(A_{i}\setminus E_{i})\right)$ . Hence, applying Lemma A.4 with the observation that each $c(A_{i})<2^{i}$ , then

[TABLE]

Let $\tilde{E}_{i}$ denote the subset of $E$ that intersects buckets of partition $i$ . For the sake of presentation, we denote $\beta:=8\ell$ . Then the total space in partition $i^{*}$ is at least $2^{i}\cdot w{\left\lceil K/2^{i}\right\rceil}+\beta\cdot c(\tilde{E}_{i})$ . Similarly, the number of buckets in partition $i^{*}$ is at least $w{\left\lceil K/2^{i}\right\rceil}+\frac{\beta\cdot c(\tilde{E}_{i})}{2^{i}}$ , of which at least half are not saturated. Since $A_{i}$ is defined to be the bucket of partition $i$ that minimizes $c(E_{i})$ among all the buckets that are not saturated, then by an averaging argument

[TABLE]

Therefore,

[TABLE]

Define $\alpha_{i}:=\frac{c(\tilde{E}_{i})}{K}$ for $1\leq i\leq\ell$ . Then $\sum_{i=1}^{\ell}\frac{\tau}{2^{i-1}}c(E_{i})\leq\left(\sum_{i=1}^{\ell}\frac{4\tau\alpha_{i}K}{wK+\beta\alpha_{i}K}\right)$ . Hence,

[TABLE]

Note that since $\sum_{i=1}^{\ell}c(\tilde{E}_{i})\leq c(E)\leq mK$ , then $\sum_{i=1}^{\ell}\alpha_{i}\leq m$ . Moreover, defining the function $f(x)$ by $f(x):=\frac{4x}{w+\beta x}$ , where $w={\left\lceil\frac{4{\left\lceil\log K\right\rceil}m}{K}\right\rceil}>0$ , we see that $f(x)$ is concave. Thus by Jensen’s inequality and setting $\alpha=\frac{m}{\ell}$ , it follows that $\left(\sum_{i=1}^{\ell}\frac{4\tau\alpha_{i}}{w+\beta\alpha_{i}}\right)\leq\ell\cdot\frac{4\tau\alpha}{w+\beta\alpha}$ . Since $\beta=8\ell$ , then

[TABLE]

Plugging Equation 5 and Equation 6 into Equation 4,

[TABLE]

We can also bound the cost of the elements in $\bigcup_{i=0}^{\ell}(A_{i}\setminus E_{i})$ : $\bigcup_{i=0}^{\ell}c(A_{i}\setminus E_{i})\leq\bigcup_{i=0}^{\ell}c(A_{i})\leq 2K+2K+\bigcup_{i=0}^{\lceil\log{K}\rceil-2}{2\cdot 2^{i}}\leq 8K$ . Hence, the optimal value of $f$ on $S_{\tau}\setminus E$ on a set of cost $8K$ , which we denote $f\left(\mathcal{OPT}(8K,S_{\tau}\setminus E)\right)$ , is at least $f\left(B_{\ell,{r}}\right)-\frac{\tau}{2}$ . By Lemma A.6, any set with size $8K$ whose items have size at most $K$ can be partitioned into $15$ sets, each with size at most $K$ . Therefore by Theorem 2.1, $f(Z_{\tau})=f({\textsc{Offline}}(K,S_{\tau}\setminus E))\geq\left(1-\frac{1}{e}\right)f\left(\mathcal{OPT}(K,S\setminus E)\right)$ . By submodularity, $f(Z_{\tau})\geq\left(1-\frac{1}{e}\right)\left(\frac{1}{11}f\left(\mathcal{OPT}(6K,S_{\tau}\setminus E)\right)\right)$ and thus, $f(Z_{\tau})\geq\frac{1}{15}\left(1-\frac{1}{e}\right)\left(f\left(B_{\ell,{r}}\right)-\frac{\tau}{2}\right)$ . $\Box$

Finally, in the third case of our analysis, where more than half of the buckets in all partitions are not saturated and no bucket $B_{\ell,{r}}$ in the last partition produces a value $f(B_{\ell,{r}})$ that is a good estimate of $f(\mathcal{OPT})$ .

Lemma A.8

Let $\ell={\left\lceil\log K\right\rceil}$ and $\tau>0$ . If no partition in $S_{\tau}$ has at least half of its buckets saturated, then:

[TABLE]

where $B_{\ell,{r}}$ is any bucket in the last partition that is not saturated.

**Proof : ** Let $Y$ be the set that contains all elements from $\mathcal{OPT}(K,V\setminus E)$ that are buckets in $S_{\tau}$ with higher priority than $B_{\ell,{r}}$ and let $X:=\mathcal{OPT}(K,V\setminus E)\setminus Y$ . For each $e\in X$ ,

[TABLE]

due to the fact that $B_{\ell,{r}}$ is the bucket in the last partition and is not saturated.

Since $f(\mathcal{OPT}(K,V\setminus E))=f(X\cup Y)$ , then by submodularity, $f(Y)\geq f(\mathcal{OPT}(K,V\setminus E))-f(X)$ . Then by monotonicity, $f(Y)\geq f(\mathcal{OPT}(K,V\setminus E))-f\left(X\,\middle|\,B_{\ell,{r}}\right)-f\left(B_{\ell,{r}}\right)$ . By submodularity, $f(Y)\geq f(\mathcal{OPT}(K,V\setminus E))-f\left(B_{\ell,{r}}\right)-\sum_{e\in X}f\left(e\,\middle|\,B_{\ell,{r}}\right)$ . Then by Equation 7, $f(Y)\geq f(\mathcal{OPT}(K,V\setminus E))-f\left(B_{\ell,{r}}\right)-\frac{\tau}{K}c(X)$ . Since $c(X)\leq K$ , then

[TABLE]

Therefore, by Theorem 2.1, $f(Z_{\tau})=f({\textsc{Offline}}(K,S_{\tau}\setminus E))\geq\left(1-1/e\right)f(K,S_{\tau}\setminus E)$ . Since $Y\subseteq(S_{\tau}\setminus E)$ , then $f(Z_{\tau})\geq\left(1-1/e\right)f(\mathcal{OPT}(K,Y))$ . The capacity of $Y$ is at most $K$ , so $f(Z_{\tau})\geq\left(1-1/e\right)f(Y)$ . Hence by Equation 8, $f(Z_{\tau})\geq\left(1-1/e\right)\left(f(\mathcal{OPT}(K,V\setminus E))-f(B_{\ell,{r}})-\tau\right)$ , as desired. $\Box$

Since the above three lemmas hold for every $\tau>0$ we can pick its value to give the desired approximation guarantee and space complexity bound. This gives Theorem 3.1, which corresponds to the first part of Theorem 1.3.

Theorem A.9

Let $\ell={\left\lceil\log K\right\rceil}$ and $\zeta=1-\frac{1}{2\ell}$ . There exists an algorithm that outputs a set $\mathcal{S}$ with $\tilde{O}(K^{2}+mK)$ elements such that, for any set $E$ of at most $m$ removed items, one can compute from $\mathcal{S}$ a set $Z\subseteq V\setminus E$ with cost at most $K$ and

[TABLE]

**Proof : ** Fix any value of $\tau>0$ and consider the bounds of Lemma A.7 and Lemma A.8 as functions of $f(B_{\ell,{r}})$ . Note that the first bound increases and the second bound decreases as a function of this parameter. Since we can always pick the better bound, the worst value for $f(B_{\ell,{r}})$ is when the two bounds are equal, i.e. $\frac{1}{15}\left(1-\frac{1}{e}\right)\left(f(B_{\ell,{r}})-\frac{\tau}{2}\right)=\left(1-\frac{1}{e}\right)\left(f(\mathcal{OPT})-f(B_{\ell,{r}})-\tau\right)$ . This occurs when $f(B_{\ell,{r}})=\frac{15f(\mathcal{OPT})}{16}-\frac{29\tau}{32}$ . Hence, the best of these two bounds is always at least $\frac{1}{16}\left(1-\frac{1}{e}\right)(f(\mathcal{OPT})-\frac{3\tau}{2})$ .

Note that this is a decreasing function of $\tau$ while the inequality of Lemma A.2 is an increasing function of $\tau$ , so we pick $\tau$ to make sure that the minimum of these two bounds is large. The optimal value of $\tau$ is $\tau=\frac{2}{32\zeta+3}\cdot f(\mathcal{OPT})$ in which case the two bounds are equal. Hence, $f(Z)\geq\frac{2\ (1-1/e)\zeta}{32\zeta+3}\cdot f(\mathcal{OPT})$ . By making guesses $\tau^{*}$ for $f(\mathcal{OPT})$ by increasing powers of $(1+\epsilon)$ , we can obtain a $(1+\epsilon)$ approximation of $\tau$ , giving a $\left(\frac{2\ (1-1/e)\zeta}{32\zeta+3}-\epsilon\right)$ approximation for $f(\mathcal{OPT})$ .

Finally, we give the bound on the number of elements returned. The number of buckets is dynamically updated until $\sum_{j=1}^{n_{i}}|B_{i,j}|<10w\cdot 2^{i}$ . Hence at most $80\cdot 2^{i}\ell w$ new buckets have been created for partition $i$ . Then the total number of elements in each partition is at most $\sum_{i=0}^{\ell}\left(10\ell w+8\ell+80\cdot 2^{i}\ell w\right)2^{i}=O\left(K^{2}\ell w\right)=O\left(K^{2}\log K+mK\log^{2}K\right)$ since $w={\left\lceil\frac{4\ell m}{K}\right\rceil}$ . Since there are $\ell=O\left(\log K\right)$ partitions, the total number of elements is $O\left(K^{2}\log^{2}K+mK\log^{3}K\right)$ for each guess of $\tau$ . Assuming $O\left(f(\mathcal{OPT})\right)=O\left(\log K\right)$ , then the total number of guesses for $\tau$ is $O\left(\frac{1}{\epsilon}\log K\right)$ , so the total number of elements is $O\left(\frac{1}{\epsilon}(K^{2}\log^{3}+mK\log^{4}K)\right)$ . $\Box$

We now show that Prune reduces the total number of elements output, while maintaining a constant factor approximation.

Lemma A.10

Suppose AlgNum outputs a set $\mathcal{S}$ from which one can compute a set $Z\subseteq V\setminus E$ with cost at most $K$ and $f(Z)$ is an $r$ -approximation to $f(\mathcal{OPT})$ . Then Prune outputs a set $T$ of size $O\left(\frac{1}{\epsilon}(K\log^{3}+m\log^{4}K)\right)$ , from which one can compute a set $W\subseteq V\setminus E$ with cost at most $K$ and $f(W)$ is an $r^{2}$ -approximation to $f(\mathcal{OPT})$ .

**Proof : ** Since Prune runs an instance of AlgNum on $S$ , then Prune provides an $r$ -approximation to $f(\mathcal{OPT}(K,S\setminus E)$ , which is an $r$ -approximation to $f(\mathcal{OPT})$ . Thus, $f(W)$ is an $r^{2}$ -approximation to $f(\mathcal{OPT})$ .

It remains to bound the number of elements in $W$ . Consider the state of AlgNum on the set $S$ sorted by size. Note that no new buckets are created in partition $i$ when the total number of items in the bucket is $10w\cdot 2^{i}$ . Let $u_{i}$ be the first time at which partition $i$ contains $10w\cdot 2^{i}$ items, and let all the elements placed in partition $i$ before time $u_{i}$ be called “old” while all the elements that are placed in partition $i$ after time $u_{i}$ be called “new”. Observe that each old element increments $s_{i}$ by a $8\ell$ multiple of its cost. Since each new element costs at least as much as each old element, $80\ell w\cdot 2^{i}$ new elements will cost at least $8\ell$ times the cost of the old elements, which fills the additional space allocated by the old elements.

Hence, the total number of elements in each partition is at most $\sum_{i=0}^{\ell}O\left(\ell w\cdot 2^{i}\right)=O\left(K\ell w\right)=O\left(K\log K+m\log^{2}K\right)$ since $w={\left\lceil\frac{4\ell m}{K}\right\rceil}$ . Since there are $\ell=O\left(\log K\right)$ partitions, the total number of elements for each guess of $\tau$ is $O\left(K\log^{2}K+m\log^{3}K\right)$ . Assuming $O\left(f(\mathcal{OPT})\right)=O\left(\log K\right)$ , then the total number of guesses for $\tau$ is $O\left(\frac{1}{\epsilon}\log K\right)$ , so the total number of elements is $O\left(\frac{1}{\epsilon}(K\log^{3}+m\log^{4}K)\right)$ . $\Box$

Together, Theorem A.9 and Lemma A.10 give the proof of Theorem 3.1. A similar approach can be used to prove Theorem 3.2.

Appendix B Missing Proofs from Section 3.2

For a specific knapsack $a$ , we call a bucket $B_{i,j}$ saturated with respect to knapsack $a$ if $c_{a}(B_{i,j})\geq\min(2^{i},K)$ . As before, we use $S_{\tau}:=\{B_{i,j}\}_{i,j}$ to denote the data structure output by AlgNum when run with parameter $\tau$ (i.e. for $\tau^{*}=\frac{\tau}{4})$ and $Z_{\tau}:={\textsc{Offline}}(\bigcup_{B_{i,j}\in S_{\tau}}B_{i,j}\setminus E)$ , where $E$ is the set of elements that are removed at the end of the stream. Finally, recall that $\ell={\left\lceil\log K\right\rceil}$ .

Lemma B.1

Let $\tau>0$ . For a fixed knapsack $a$ , if there exists a partition in $S_{\tau}$ with at least half of its buckets saturated with respect to knapsack $a$ , then

[TABLE]

**Proof : ** Let $a$ be a fixed knapsack and $i^{*}$ be a partition in $S_{\tau}$ with at least half of its buckets saturated with respect to knapsack $a$ . Let $B_{i^{*},j^{*}}$ be a saturated bucket that minimizes $c_{a}(B_{i^{*},j^{*}}\cap E)$ . Let $I$ be the cost of the items of $E$ with respect to knapsack $a$ that are in partition $i^{*}$ ,

[TABLE]

Then the total space in partition $i$ is at least $2^{i^{*}+1}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+(8\ell)I$ , so the total number of buckets is at least $\frac{2^{i^{*}+1}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+(8\ell)I}{2^{i^{*}}}$ . Since at least half of its buckets are saturated with respect to knapsack $a$ , the total number of saturated buckets is at least $\frac{t_{i}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+(4\ell)I}{2^{i^{*}}}$

By an averaging argument, the cost of the elements in $B_{i^{*},j^{*}}$ with respect to knapsack $a$ that are removed by $E$ is at most

[TABLE]

Thus, $c_{a}(B_{i^{*},j^{*}}\setminus E)$ is at least $c_{a}(B_{i^{*},j^{*}})-\frac{2^{i^{*}}\cdot I}{2^{i^{*}}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+(4\ell)I}$ .

Note that if $B_{i^{*},j^{*}}$ is saturated with respect to knapsack $a$ , then the marginal density of each element exceeds a threshold of $\frac{\tau}{(1+2d)\cdot c_{a}(B_{i^{*},j^{*}})}$ so that $f(B_{i^{*},j^{*}}\setminus E)\geq\left(c_{a}(B_{i^{*},j^{*}})-\frac{2^{i^{*}}\cdot I}{2^{i^{*}}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+(4\ell)I}\right)\cdot\frac{\tau}{(1+2d)\cdot c_{a}(B_{i^{*},j^{*}})}$ . Since $\frac{2^{i^{*}}\cdot I}{t_{i}\cdot w{\left\lceil K/2^{i^{*}}\right\rceil}+(4\ell)I}<\frac{2^{i^{*}}}{4\ell}$ and $c_{a}(B_{i^{*},j^{*}})\geq 2^{i^{*}-1}\geq 1$ for a saturated bucket $B_{i^{*},j^{*}}$ , then it follows that $f(B_{i^{*},j^{*}}\setminus E)\geq\left(1-\frac{1}{2\ell}\right)\frac{\tau}{1+2d}$ . Hence by Theorem 2.1, running Offline on $B_{i^{*},j^{*}}\setminus E$ produces a $\frac{1}{1+2d}\left(1-\frac{1}{e}\right)\left(1-\frac{1}{2\ell}\right)$ approximation. $\Box$

The following lemma corresponds to Lemma A.3, using the threshold of AlgMult.

Lemma B.2

Let $E_{i}:=A_{i}\cap E$ denote the elements that are removed from a bucket $A_{i}$ in partition $i>0$ . Given a bucket $A_{i-1}$ from partition $i-1$ that is not saturated and any knapsack $a$ , then the loss in bucket $A_{i}$ induced by the removals is at most

[TABLE]

The following lemma corresponds to Lemma A.4, using Lemma B.2 and the threshold of AlgMult.

Lemma B.3

Suppose that there exists some bucket in every partition that is not saturated with respect to some particular knapsack $a$ . For every partition $i$ , let $A_{i}$ denote a bucket with $c_{a}(A_{i})<\min\{2^{i},K\}$ and let $E_{i}:=A_{i}\cap E$ denote the elements that are removed from $A_{i}$ . The loss in the bucket $B_{\ell,{r}}$ induced by the removals, given the remaining elements in the previous buckets, is at most $f\left(E_{\ell}\,\middle|\,\bigcup_{j=0}^{\ell-1}\left(A_{j}\setminus E_{j}\right)\right)\leq\sum_{j=1}^{\ell}\frac{\tau}{2^{j-1}(1+2d)}c_{a}(E_{j})$ .

The following lemma corresponds to Lemma A.7, using Lemma B.3, the threshold of AlgMult, and the observation that an optimal solution considering only a particular knapsack constraint is at least as good as an optimal solution considering additional other knapsack constraints. However, the $16d$ factor in the denominator results from a bucket $c(A_{i})\leq d\cdot 2^{i}$ , due to the definition of $c(e)=\max_{1\leq a\leq d}c_{a}(e)$ .

Lemma B.4

Let $\tau>0$ . If no partition in $S_{\tau}$ has at least half of its buckets saturated with respect to any knapsack $a$ with $1\leq a\leq d$ , then $f(Z_{\tau})\geq\frac{1}{16d}\left(1-\frac{1}{e}\right)\cdot\left(f\left(B_{\ell,{r}}\right)-\frac{\tau}{2(1+2d)}\right)$ , where $B_{\ell,{r}}$ is any bucket in the last partition that is not saturated.

The following lemma is similar to Lemma A.8 and follows along the same proof, with the observation that $c(X)\leq dK$ .

Lemma B.5

Let $\tau>0$ . If no partition in $S_{\tau}$ has at least half of its buckets saturated with respect to any knapsack $a$ with $1\leq a\leq d$ , then $f(Z_{\tau})\geq\left(1-\frac{1}{e}\right)\cdot\left(f(\mathcal{OPT}(K,V\setminus E))-f(B_{\ell,{r}})-\frac{d\tau}{1+2d}\right)$ , where $B_{\ell,{r}}$ is any bucket in the last partition that is not saturated.

We now prove Theorem 3.2.

**Proof of Theorem 3.2: ** The $r$ -approximation guarantee follows from Lemma B.1, Lemma B.4 and Lemma B.5, when $f(B)=\frac{f(\mathcal{OPT})}{2}$ , and $\tau=\frac{f(\mathcal{OPT})}{4}$ . The $r^{2}$ -approximation guarantee and space bounds follow from Lemma A.10 with Prune using AlgMult instead of AlgNum. $\Box$

Appendix C Missing Proofs from Section 4

We first prove Lemma 4.2.

**Proof of Lemma 4.2: ** We will show that $S$ returned by line 14 of Algorithm 6 equals to the output of AlgMult run on a stream of $V$ such that:

•

$F$ is a prefix of this stream.

•

Elements of $F\cup R$ appear in the same order in the stream as they appear in Algorithm 6.

•

The order of the remaining elements is arbitrary.

Let $\mathcal{B}_{{\textsc{AlgMult}}}$ be the structure of sets $B_{i,j}$ and their content after AlgMult is executed on this stream.

Next, recall that AlgMult never removes any element from any $B_{i,j}$ . Also, recall that $\mathcal{B}_{0}$ is obtained by executing AlgMult on $F$ . Hence, since $F$ is a prefix of the stream, for each $(i,j)$ the content of $B_{i,j}$ in $\mathcal{B}_{0}$ is a subset of the content of $B_{i,j}$ in $\mathcal{B}_{{\textsc{AlgMult}}}$ . Furthermore, $f$ is a submodular function, so if an element $e$ is not added to $B_{i,j}$ due to its small marginal gain, $e$ will not be added to a superset of $B_{i,j}$ neither. This implies that after $F$ is processed, no element $e$ that is not added to $R_{i}$ on line 8 of Algorithm 6 can ever be added to any $B_{i,j}$ (regardless of ordering of the elements $V\setminus F$ ). Therefore, the only relevant elements in the rest of the stream are those in $R$ .

The proof now follows from the fact that Theorem 3.2 holds regardless of ordering of the stream. $\Box$

To prove Lemma 4.3, we need the following result for submartingales.

Theorem C.1 (Azuma’s Inequality)

Suppose $X_{0},X_{1},\ldots,X_{n}$ is a submartingale and $|X_{i}-X_{i+1}|\leq c_{i}$ . Then

[TABLE]

**Proof of Lemma 4.3: ** Observe that the expected number of elements in $S$ is $4\sqrt{nL}$ so that $|S|<3\sqrt{nL}$ occurs only with probability at most $e^{-\Omega(\sqrt{nL})}\leq e^{-\Omega(L)}$ by standard Chernoff bounds. Thus, $|S|\geq 3\sqrt{nL}$ with high probability. Let $N_{S}$ denote the total number of elements $e$ that are added to $\mathcal{B}_{0}$ by ${\textsc{AlgMult}}_{\mathcal{B}_{0},\{e\}}(d,m,K,\tau)$ , so that exactly $N_{S}+|S|$ elements are sent to $C$ in round two.

Suppose we split the sample set $S$ into $3L$ pieces of size $\sqrt{n/L}$ and process each piece sequentially. Suppose further that before some piece, there are at least $\sqrt{nL}$ remaining elements that would be added to $\mathcal{B}_{0}$ as described above. Then an additional element is added to $\mathcal{B}_{0}$ with probability at least $1-\left(1-\sqrt{\frac{L}{n}}\right)^{\sqrt{\frac{n}{L}}}>1/2$ , conditioned on any previous actions of the algorithm, since each piece can be sampled independently and thus we can use a martingale argument to bound the number of elements selected in $S$ .

Let $X_{i}$ be the indicator random variable for the event that at least one element is selected from the $i$ -th piece so that $\mathbb{E}[X_{i}\mid X_{1},\ldots,X_{i-1}]\geq 1/2$ . Let $Y_{i}=\sum_{j=1}^{i}(X_{i}-1/2)$ so that the sequence $Y_{1},Y_{2},\ldots$ is a submartingale and hence, $\mathbb{E}[Y_{i}\mid Y_{1},\ldots,Y_{i-1}]\geq Y_{i-1}$ and $|Y_{i}-Y_{i-1}|\leq 1$ . Therefore, $\Pr[Y_{3L}<-\frac{1}{2}L]<e^{-\Omega(L)}$ by Azuma’s inequality (Theorem C.1). Hence with probability $1-e^{-\Omega(L)}$ , $\sum_{j=1}^{3L}X_{j}=Y_{L}+\frac{3}{2}L\geq L$ and $\mathcal{B}_{0}$ includes at least $L$ elements overall, in which case nothing is sent to the central machine. Otherwise, the number of remaining elements added to $\mathcal{B}_{0}$ is less than $\sqrt{nL}$ . $\Box$

Appendix D Robust to removal of size $M$

In this section, we consider the ARMSM $(m,K)$ problem under a single knapsack constraint, when the $m$ items have cost at most $M$ . In contrast to AlgNum, we no longer need a dynamic allocation of new buckets, so having a fixed number of buckets for each partition suffices. We give our algorithm in full in AlgSize.

To show that the optimal solution of $Z$ output by AlgSize is a good approximation to the optimal solution of the entire stream, we call a bucket $B_{i,j}$ saturated if $c(B_{i,j})\geq\min\{2^{i},K\}$ and break the analysis into the following three cases:

At least half of the buckets in some partition are saturated (Lemma D.1) 2. 2.

More than half of the buckets in all partitions are not saturated, but there exists some bucket in the last partition that is a good estimate of $f(\mathcal{OPT})$ (Lemma D.2) 3. 3.

More than half of the buckets in all partitions are not saturated and no bucket in the last partition is a good estimate of $f(\mathcal{OPT})$ (Lemma D.3)

In the first case, if most of the buckets in some partition are saturated, we argue through an averaging argument that some saturated bucket $B_{i,j}$ in this partition cannot have too much size intersection with the elements $E$ that are removed at the end of the stream, giving a lower bound on $c(B_{i,j}\setminus E)$ . Since elements can only be added to this bucket if the ratio of their marginal gain to their size exceeds a certain threshold, then we conclude that $f(B_{i,j}\setminus E)$ is at least the product of $c(B_{i,j}\setminus E)$ this threshold, which gives a good approximation to $f(\mathcal{OPT})$ .

In the second case, if there exists some bucket $B_{\ell,{r}}$ in the last partition that is a good estimate of $f(\mathcal{OPT})$ , we first use a technical lemma to show that the optimal solution on $Z$ is at least $f(B_{\ell,{r}})$ minus the value of the elements across all the buckets that were deleted by $E$ . To bound the value of these elements, we argue that if most of the buckets in all partitions are not saturated, then no element in a bucket $B_{i,j}$ that is deleted by $E$ can value that is too high, because otherwise it would have been added to a bucket in some previous partition less than $i$ . Hence, we derive an upper bound on the value of the elements across all the buckets that were deleted by $E$ , and this suffices to show that the optimal solution of $Z$ is close to $f(\mathcal{OPT})$ , since $f(B_{\ell,{r}})$ is a good approximation to $f(\mathcal{OPT})$ .

In the third case, if all buckets in the last partition give poor estimates of $f(\mathcal{OPT})$ , then for each of these buckets, the total size of the elements in the bucket cannot be large. As a result, most elements of $\mathcal{OPT}$ must either be in a previous partition or have poor marginal gain. If most elements of $\mathcal{OPT}$ are in a previous partition, then the union of the items in the previous partitions are contained in $Z$ and thus the optimal solution of $Z$ is a good approximation to $f(\mathcal{OPT})$ . If most elements of $\mathcal{OPT}$ have poor marginal gain, then there must be some item of $\mathcal{OPT}$ with substantial value. On the other hand, since each partition contain many buckets that are not saturated, then this substantial item must have been captured by some bucket in a previous partition and so again, the optimal solution of $Z$ is a good approximation to $f(\mathcal{OPT})$ . Intuitively, if at least half of the buckets in some partition are saturated, then some saturated bucket $B_{i^{*},j}$ in this partition cannot be affected too much by the removal of elements at the end of the stream. Hence, this bucket $B_{i^{*},j}$ gives a good approximation to $\tau$ , which in turn serves as a good approximation to $f(\mathcal{OPT})$ .

Lemma D.1

Let $\tau>0$ . If there exists a partition in $S_{\tau}$ such that at least half of its buckets are saturated, then

[TABLE]

**Proof : ** Let $i$ be a partition such that half of its buckets are saturated. Let $B_{i,j}$ be a saturated bucket that minimizes $c(B_{i,j}\cap E)$ . Since every partition contains $w{\left\lceil K/2^{i}\right\rceil}$ buckets, the number of saturated buckets in partition $i$ is at least $wK/2^{i+1}$ . By a simple averaging argument and the observation that $c(E)\leq M$ ,

[TABLE]

Thus,

[TABLE]

Note that if $B_{i,j}$ is saturated, then the marginal gain to weight ratio of each element exceeds a threshold of $\frac{\tau}{c(B_{i,j})}$ so that

[TABLE]

where the last step follows from the observation that $c(B_{i,j})\geq 2^{i-1}$ for a saturated bucket $B_{i,j}$ . Hence, running Offline on $B_{i,j}\setminus E$ produces a $\left(1-\frac{1}{e}\right)\left(1-\frac{4M}{wK}\right)\tau$ approximation by Theorem 2.1. $\Box$

The second case of our analysis occurs when more than half of the buckets in all partitions are not saturated, but there exists some bucket in the last partition that is a good estimate of $f(\mathcal{OPT})$ . We now show that AlgSize yields a good approximation in this case.

Lemma D.2

Let $\ell={\left\lceil\log K\right\rceil}$ and $\tau>0$ . If no partition in $S_{\tau}$ has at least half of its buckets saturated, then

[TABLE]

where $B_{\ell,{r}}$ is any bucket in the last partition that is not saturated.

**Proof : ** Let $t_{i}=\min\{2^{i},K\}$ . Let $B_{i}$ denote the bucket in partition $i$ with $c(B_{i})<t_{i}$ for which $c(E_{i})$ is minimized, where $E_{i}:=B_{i}\cap E$ . By setting $B=B_{\ell,{r}}$ , $R=E_{\ell}$ and $A=\bigcup_{i=0}^{\ell-1}(B_{i}\setminus E_{i})$ in Lemma A.5, we have that $f\left(\bigcup_{i=0}^{\ell}(B_{i}\setminus E_{i})\right)\geq f\left(B_{\ell,{r}}\right)-f\left(E_{\ell}\bigg{|}\bigcup_{i=0}^{\ell-1}(B_{i}\setminus E_{i})\right)$ . Since each $c(B_{i})<t_{i}$ , then by Lemma A.4, $f\left(\bigcup_{i=0}^{\ell}(B_{i}\setminus E_{i})\right)$ is at least

[TABLE]

Let $\tilde{E}_{i}$ denote the subset of $E$ that intersects buckets of partition $i$ . Since each $B_{i}$ is defined to be the bucket of partition $i$ that minimizes $c(E_{i})$ among all the buckets that are not saturated and each partition $i$ contains $w{\left\lceil K/2^{i}\right\rceil}$ buckets, of which more than half are not saturated, then $c(E_{i})$ is at most

[TABLE]

Hence,

[TABLE]

Plugging this inequality into Equation 9,

[TABLE]

The cost of the elements in $\bigcup_{i=0}^{\ell}(B_{i}\setminus E_{i})$ is at most $\bigcup_{i=0}^{\ell}c(B_{i})\leq 2K+2K+\bigcup_{i=0}^{\lceil\log{2K}\rceil-2}{2\cdot 2^{i}}\leq 6K$ . Hence, the optimal value of $f$ on $S_{\tau}\setminus E$ on a set of cost $6K$ , which we denote $f\left(\mathcal{OPT}(6K,S_{\tau}\setminus E)\right)$ , is at least $f\left(B_{\ell,{r}}\right)-\frac{4M}{wK}\tau$ . Therefore,

[TABLE]

where the first inequality holds by Theorem 2.1 and the second inequality holds by submodularity and the observation that any set with size $6K$ whose items have size at most $K$ can be partitioned into $11$ sets, each with size at most $K$ (i.e., Lemma A.6). $\Box$

The following lemma is the same as Lemma A.8, with the identical proof.

Lemma D.3

Let $\ell={\left\lceil\log K\right\rceil}$ and $\tau>0$ . If no partition in $S_{\tau}$ has at least half of its buckets saturated, then

[TABLE]

where $B_{\ell,{r}}$ is any bucket in the last partition that is not saturated.

Theorem D.4 then follows from optimizing parameters to give an approximation guarantee for an algorithm using AlgSize, when the $m$ items have cost at most $M$ .

Theorem D.4

*For the ARMSM $(m,K)$ problem subject to a knapsack constraint, there exists an algorithm that outputs a set $Z$ so that $f(Z)$ is a constant factor approximation to $f(\mathcal{OPT})$ and stores $\tilde{O}(K+M)$ elements, if the removed items have cost at most $M$ . *

**Proof : ** Let $\eta=\frac{4M}{wK}$ and $f(\mathcal{OPT}):=f(\mathcal{OPT}(K,V\setminus E)$ . Let $t_{i}=\min\{2^{i},K\}$ . From Lemma D.2 and Lemma D.3, it follows that the worst case bound for $f(B_{\ell,{r}})$ occurs when $\frac{1}{11}\left(1-\frac{1}{e}\right)\left(f(B_{\ell,{r}})-\eta\tau\right)=\left(1-\frac{1}{e}\right)\left(f(\mathcal{OPT})-f(B_{\ell,{r}})-\tau\right)$ . Some straightforward computation shows this occurs when $f(B_{\ell,{r}})=\frac{11f(\mathcal{OPT})}{12}-\frac{11\tau}{12}+\frac{\eta}{12}$ . It follows from Lemma D.1 that the optimal value of $\tau$ occurs at $\tau=\frac{f(\mathcal{OPT})}{13-11\eta}$ , which gives $f(Z)\geq\left(1-\frac{1}{e}\right)\frac{1-\eta}{13-11\eta}f(\mathcal{OPT})$ . For $w={\left\lceil\frac{4\ell M}{K}\right\rceil}$ , we have $\eta\leq\frac{1}{\ell}$ , so $f(Z)\geq\frac{1-1/\log K}{13}f(\mathcal{OPT})$ . By making guesses for $f(\mathcal{OPT})$ by increasing powers of $(1+\epsilon)$ , we can obtain a $(1+\epsilon)$ approximation of $\tau$ , giving a $\frac{1-1/\log K}{13}-\epsilon$ approximation for $f(\mathcal{OPT})$ .

Allowing each element in $S$ to be stored by AlgSize using one word of space, then the total space that AlgSize uses is at most

[TABLE]

Since $w={\left\lceil\frac{4\ell M}{K}\right\rceil}$ , then $|S|=O\left(K\log K+M\log^{2}K\right)$ . Assuming $O\left(f(\mathcal{OPT})\right)=O\left(\log K\right)$ , then the total number of guesses for $\tau$ is $O\left(\frac{1}{\epsilon}\log K\right)$ . Therefore, the total number of stored elements is $O\left(\frac{1}{\epsilon}(K\log^{2}K+M\log^{3}K)\right)$ . $\Box$

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Shipra Agrawal, Mohammad Shadravan, and Cliff Stein. Submodular secretary problem with shortlists. ar Xiv preprint ar Xiv:1809.05082 , 2018.
2[2] Nima Anari, Nika Haghtalab, Joseph Naor, Sebastian Pokutta, Mohit Singh, and Alfredo Torrico. Robust submodular maximization: Offline and online algorithms. Co RR , abs/1710.04740, 2017.
3[3] Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 671–680. ACM, 2014.
4[4] Eric Balkanski, Aviad Rubinstein, and Yaron Singer. An exponential speedup in parallel running time for submodular maximization without loss in approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA , pages 283–302, 2019.
5[5] Eric Balkanski and Yaron Singer. The adaptive complexity of maximizing a submodular function. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC , pages 1138–1151, 2018.
6[6] Mohammad Hossein Bateni, Hossein Esfandiari, and Vahab S. Mirrokni. Optimal distributed submodular optimization via sketching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD , pages 1138–1147, 2018.
7[7] Ilija Bogunovic, Slobodan Mitrović, Jonathan Scarlett, and Volkan Cevher. Robust submodular maximization: A non-uniform partitioning approach. In Proceedings of the 34th International Conference on Machine Learning, ICML , pages 508–516, 2017.
8[8] Gruia Călinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. , 40(6):1740–1766, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Adversarially Robust Submodular Maximization under Knapsack Constraints

Abstract

1 Introduction

1.1 Adversarially Robust Monotone Submodular Maximization

Problem 1.1** (MSM under knapsack constraints)**

Problem 1.2** (ARMSM under knapsack constraints)**

1.2 Our Theoretical Results

Streaming algorithms.

Theorem 1.3** (ARMSM under one knapsack constraint)**

Theorem 1.4** (ARMSM under ddd knapsack constraints)**

Distributed algorithms.

Theorem 1.5** (Distributed ARMSM)**

1.3 Empirical Evaluations

1.4 Previous Work

2 Techniques

Theorem 2.1

2.1 Robustness to the Removal of mmm Items

3 Streaming Algorithms

3.1 Single Knapsack Constraint

Theorem 3.1

3.2 ddd Knapsack Constraints

Theorem 3.2

4 Distributed Algorithm

Theorem 4.1

Lemma 4.2

Lemma 4.3

5 Experiments

Robustification.

5.1 Baselines

Robustified MarginalRatio.

Robustified offline Greedy.

Robustified Multidimensional.

5.2 Objectives and Datasets

Dominating set.

Movie recommendation.

5.3 Experimental Evaluation and Results

Adversarial removals.

Evaluation.

Results.

6 Conclusion

Appendix A Missing Proofs from Section 3.1

Definition A.1

Lemma A.2

Lemma A.3

Lemma A.4

Base case i=1i=1i=1.

Inductive step i>1i>1i>1.

Lemma A.5

Lemma A.6

Lemma A.7

Lemma A.8

Theorem A.9

Lemma A.10

Appendix B Missing Proofs from Section 3.2

Lemma B.1

Lemma B.2

Lemma B.3

Lemma B.4

Lemma B.5

Appendix C Missing Proofs from Section 4

Theorem C.1** (Azuma’s Inequality)**

Appendix D Robust to removal of size MMM

Lemma D.1

Lemma D.2

Lemma D.3

Theorem D.4

Problem 1.1 (MSM under knapsack constraints)

Problem 1.2 (ARMSM under knapsack constraints)

Theorem 1.3 (ARMSM under one knapsack constraint)

Theorem 1.4 (ARMSM under $d$ knapsack constraints)

Theorem 1.5 (Distributed ARMSM)

2.1 Robustness to the Removal of $m$ Items

3.2 $d$ Knapsack Constraints

Base case $i=1$ .

Inductive step $i>1$ .

Theorem C.1 (Azuma’s Inequality)

Appendix D Robust to removal of size $M$