Almost Optimal Semi-streaming Maximization for k-Extendible Systems

Moran Feldman; Ran Haba

arXiv:1906.04449·cs.DS·June 12, 2019

Almost Optimal Semi-streaming Maximization for k-Extendible Systems

Moran Feldman, Ran Haba

PDF

Open Access

TL;DR

This paper introduces an almost optimal semi-streaming algorithm for maximizing weight under k-extendible constraints, significantly improving approximation ratios in the data stream model.

Contribution

The paper presents a semi-streaming O(k log k)-approximation algorithm for the general k-extendible maximization problem, nearly matching offline bounds.

Findings

01

Achieves near-optimal approximation ratio in semi-streaming model

02

Improves upon previous algorithms with higher approximation factors

03

Bridges gap between restricted and general cases

Abstract

In this paper we consider the problem of finding a maximum weight set subject to a $k$ -extendible constraint in the data stream model. The only non-trivial algorithm known for this problem to date---to the best of our knowledge---is a semi-streaming $k^{2} (1 + ε)$ -approximation algorithm (Crouch and Stubbs, 2014), but semi-streaming $O (k)$ -approximation algorithms are known for many restricted cases of this general problem. In this paper, we close most of this gap by presenting a semi-streaming $O (k lo g k)$ -approximation algorithm for the general problem, which is almost the best possible even in the offline setting (Feldman et al., 2017).

Equations40

w_{2} (u) = k^{⌊ l o g_{k} w (u)⌋} = k^{⌊ ℓ^{- 1} \cdot l o g_{2} w (u)⌋} = k^{ℓ^{- 1} \cdot {⌊ l o g_{2} w (u)⌋ - ⌊ l o g_{2} w (u)⌋ mod ℓ}} = k^{ℓ^{- 1} \cdot ⌊ l o g_{2} w (u)⌋} / 2^{i (u) mod ℓ} .

w_{2} (u) = k^{⌊ l o g_{k} w (u)⌋} = k^{⌊ ℓ^{- 1} \cdot l o g_{2} w (u)⌋} = k^{ℓ^{- 1} \cdot {⌊ l o g_{2} w (u)⌋ - ⌊ l o g_{2} w (u)⌋ mod ℓ}} = k^{ℓ^{- 1} \cdot ⌊ l o g_{2} w (u)⌋} / 2^{i (u) mod ℓ} .

\frac{w ( u )}{2} = k^{l o g_{k} w (u) - l o g_{k} 2} = k^{ℓ^{- 1} l o g_{2} w (u) - ℓ^{- 1}} \leq k^{ℓ^{- 1} \cdot ⌊ l o g_{2} w (u)⌋} = w_{2} (u) \cdot 2^{i (u) mod ℓ},

\frac{w ( u )}{2} = k^{l o g_{k} w (u) - l o g_{k} 2} = k^{ℓ^{- 1} l o g_{2} w (u) - ℓ^{- 1}} \leq k^{ℓ^{- 1} \cdot ⌊ l o g_{2} w (u)⌋} = w_{2} (u) \cdot 2^{i (u) mod ℓ},

w_{2} (u) \cdot 2^{i (u) mod ℓ} = k^{ℓ^{- 1} \cdot ⌊ l o g_{2} w (u)⌋} \leq k^{ℓ^{- 1} \cdot l o g_{2} w (u)} = k^{l o g_{k} w (u)} = w (u) .

w_{2} (u) \cdot 2^{i (u) mod ℓ} = k^{ℓ^{- 1} \cdot ⌊ l o g_{2} w (u)⌋} \leq k^{ℓ^{- 1} \cdot l o g_{2} w (u)} = k^{l o g_{k} w (u)} = w (u) .

w_{2} (u) = k^{⌊ l o g_{k} w (u)⌋} \leq k^{l o g_{k} w (u)} = w (u) and w_{2} (u) = k^{⌊ l o g_{k} w (u)⌋} \geq k^{l o g_{k} w (u) - 1} = \frac{w ( u )}{k} . \qed

w_{2} (u) = k^{⌊ l o g_{k} w (u)⌋} \leq k^{l o g_{k} w (u)} = w (u) and w_{2} (u) = k^{⌊ l o g_{k} w (u)⌋} \geq k^{l o g_{k} w (u) - 1} = \frac{w ( u )}{k} . \qed

w (O P T) = i = 0 \sum ℓ - 1 w (B_{i} \cap O P T) .

w (O P T) = i = 0 \sum ℓ - 1 w (B_{i} \cap O P T) .

w (O P T) \leq ℓ \cdot w (O P T \cap B_{i}) \leq 2 ℓ \cdot w_{2} (O P T \cap B_{i}) \cdot 2^{i} \leq 2 ℓ α \cdot w_{2} (C_{i}) \cdot 2^{i} \leq 2 ℓ α \cdot w (C_{i}) \leq 2 ℓ α \cdot w (T),

w (O P T) \leq ℓ \cdot w (O P T \cap B_{i}) \leq 2 ℓ \cdot w_{2} (O P T \cap B_{i}) \cdot 2^{i} \leq 2 ℓ α \cdot w_{2} (C_{i}) \cdot 2^{i} \leq 2 ℓ α \cdot w (C_{i}) \leq 2 ℓ α \cdot w (T),

ρ (i_{m a x} - i_{m i n} + 1) \leq ρ (lo g_{k} (\frac{w _{m a x}}{w _{m i n}}) + 1) = ρ \cdot O (\frac{lo g ( \nicefrac w _{m a x} w _{m i n} )}{lo g k} + 1) .

ρ (i_{m a x} - i_{m i n} + 1) \leq ρ (lo g_{k} (\frac{w _{m a x}}{w _{m i n}}) + 1) = ρ \cdot O (\frac{lo g ( \nicefrac w _{m a x} w _{m i n} )}{lo g k} + 1) .

k \cdot ∣ T_{i} ∖ C_{i} ∣ \geq ∣ C_{i} ∖ T_{i} ∣ .

k \cdot ∣ T_{i} ∖ C_{i} ∣ \geq ∣ C_{i} ∖ T_{i} ∣ .

k \cdot ∣ T_{i} ∣ \geq

k \cdot ∣ T_{i} ∣ \geq

=

k \cdot ∣ T_{i} ∖ T_{i + 1} ∣ + k \cdot ∣ T_{i + 1} ∣ =

k \cdot ∣ T_{i} ∖ T_{i + 1} ∣ + k \cdot ∣ T_{i + 1} ∣ =

\geq

u \in f^{- 1} (t) \sum \mspace - 9.0 m u w (u) = u \in f^{- 1} (t) w (u) = w (t) \sum \mspace - 9.0 m u w (u) + e \in f^{- 1} (t) w (u) < w (t) \sum \mspace - 9.0 m u w (u) \leq k \cdot w (t) + (k^{2} - k) \cdot \frac{w ( t )}{k} \leq 2 k \cdot w (t) .

u \in f^{- 1} (t) \sum \mspace - 9.0 m u w (u) = u \in f^{- 1} (t) w (u) = w (t) \sum \mspace - 9.0 m u w (u) + e \in f^{- 1} (t) w (u) < w (t) \sum \mspace - 9.0 m u w (u) \leq k \cdot w (t) + (k^{2} - k) \cdot \frac{w ( t )}{k} \leq 2 k \cdot w (t) .

w (O P T) = u \in O P T \sum \mspace - 9.0 m u w (u) = t \in T \sum u \in f^{- 1} (t) \sum \mspace - 9.0 m u w (u) \leq t \in T \sum \mspace 9.0 m u [2 k \cdot w (t)] = 2 k \cdot w (T),

w (O P T) = u \in O P T \sum \mspace - 9.0 m u w (u) = t \in T \sum u \in f^{- 1} (t) \sum \mspace - 9.0 m u w (u) \leq t \in T \sum \mspace 9.0 m u [2 k \cdot w (t)] = 2 k \cdot w (T),

∣ O P T ∣ \cdot w_{m i n} \leq ρ \cdot \frac{max _{u \in \cN} w ( u )}{2 ρ} = \frac{max _{u \in \cN} w ( u )}{2} \leq \frac{w ( O P T )}{2} .

∣ O P T ∣ \cdot w_{m i n} \leq ρ \cdot \frac{max _{u \in \cN} w ( u )}{2 ρ} = \frac{max _{u \in \cN} w ( u )}{2} \leq \frac{w ( O P T )}{2} .

O (ρ) \cdot 1 \leq h \leq n max {i_{m a x} (h) - i_{m i n} (h) + 2} = O (ρ) \cdot 1 \leq h \leq n max {lo g_{k} w_{m a x} (h) - lo g_{k} ⌈ w_{m i n} (h)⌉ + 2}

O (ρ) \cdot 1 \leq h \leq n max {i_{m a x} (h) - i_{m i n} (h) + 2} = O (ρ) \cdot 1 \leq h \leq n max {lo g_{k} w_{m a x} (h) - lo g_{k} ⌈ w_{m i n} (h)⌉ + 2}

\leq

\leq

w (u_{i}) = 2^{l o g_{k} w (u_{i})} \leq 2^{i_{m i n} (h_{i}^{'}) - 1} \leq w_{m i n} (h_{i}^{'}) = \frac{w _{m a x} ( h _{i}^{'} )}{( 2 k \cdot g ( h _{i}^{'} ) ) ^{2}} \leq \frac{max _{u \in \cN} w ( u )}{( 2 k \cdot ( i / k ) ) ^{2}} \leq \frac{w ( O P T )}{4 i ^{2}},

w (u_{i}) = 2^{l o g_{k} w (u_{i})} \leq 2^{i_{m i n} (h_{i}^{'}) - 1} \leq w_{m i n} (h_{i}^{'}) = \frac{w _{m a x} ( h _{i}^{'} )}{( 2 k \cdot g ( h _{i}^{'} ) ) ^{2}} \leq \frac{max _{u \in \cN} w ( u )}{( 2 k \cdot ( i / k ) ) ^{2}} \leq \frac{w ( O P T )}{4 i ^{2}},

w (O P T \cap F) = i = 1 \sum ∣ O P T \cap F ∣ \mspace - 9.0 m u w (u_{i}) \leq i = 1 \sum ∣ O P T \cap F ∣ \frac{w ( O P T )}{4 i ^{2}} \leq \frac{w ( O P T )}{4} \cdot [1 + \int_{1}^{\infty} i^{- 2}] = \frac{w ( O P T )}{2} . \qed

w (O P T \cap F) = i = 1 \sum ∣ O P T \cap F ∣ \mspace - 9.0 m u w (u_{i}) \leq i = 1 \sum ∣ O P T \cap F ∣ \frac{w ( O P T )}{4 i ^{2}} \leq \frac{w ( O P T )}{4} \cdot [1 + \int_{1}^{\infty} i^{- 2}] = \frac{w ( O P T )}{2} . \qed

w (T) = w (T^{'}) \geq \frac{w ( O P T ^{'} )}{2 k} \geq \frac{w ( O P T ∖ F )}{2 k},

w (T) = w (T^{'}) \geq \frac{w ( O P T ^{'} )}{2 k} \geq \frac{w ( O P T ∖ F )}{2 k},

\frac{w ( O P T )}{2} \leq w (O P T) - w (O P T \cap F) = w (O P T ∖ F) \leq 2 k \cdot w (T) .

\frac{w ( O P T )}{2} \leq w (O P T) - w (O P T \cap F) = w (O P T ∖ F) \leq 2 k \cdot w (T) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Optimization and Search Problems · Stochastic Gradient Optimization Techniques

Full text

\forLoop

126calBbCounter

Almost Optimal Semi-streaming

Maximization for $k$ -Extendible Systems

Moran Feldman Dept. of Mathematics and Computer Science, Open University of Israel. E-mail: [email protected]

Ran Haba Dept. of Mathematics and Computer Science, Open University of Israel. E-mail: [email protected]

Abstract

In this paper we consider the problem of finding a maximum weight set subject to a $k$ -extendible constraint in the data stream model. The only non-trivial algorithm known for this problem to date—to the best of our knowledge—is a semi-streaming $k^{2}(1+\varepsilon)$ -approximation algorithm (Crouch and Stubbs, 2014), but semi-streaming $O(k)$ -approximation algorithms are known for many restricted cases of this general problem. In this paper, we close most of this gap by presenting a semi-streaming $O(k\log k)$ -approximation algorithm for the general problem, which is almost the best possible even in the offline setting (Feldman et al., 2017).

Keywords: $k$ -extendible systems, streaming, combinatorial optimization, greedy algorithms

1 Introduction

Many problems in combinatorial optimization can be cast as special cases of the following general task. Given a ground set $\cN$ of weighted elements, find a maximum weight subset of $\cN$ obeying some constraint $\cC$ . In general, one cannot get any reasonable approximation ratio for this general task since it captures many hard problems such as maximum independent set in graphs. However, the existing literature includes many interesting classes of constraints for which the above task becomes more tractable. In particular, in the 1970’s Jenkyns [14] and Korte and Hausmann [15] suggested, independently, a class of constraints named $k$ -set system constraints which represents a sweet spot between generality and tractability. On the one hand, finding a maximum weight set subject to a $k$ -set system constraint captures many well known problems such as matching in hypergraphs, matroid intersection and asymmetric travelling salesperson. On the other hand, $k$ -set system constraints have enough structure to allow a simple greedy algorithm to find a maximum weight set subject to such a constraint up to an approximation ratio of $k$ .111 $k$ is a parameter of the constraint which intuitively captures its complexity. The exact definition of $k$ is given in Section 2, but we note here that in many cases of interest $k$ is quite small. For example, matroid intersection is a $2$ -set system.

The $k$ -approximation obtained by the greedy algorithm for finding a maximum weight set subject to a $k$ -set system constraint was recently shown to be the best possible [1]. Nevertheless, over the years many works improved over it either by achieving a better guarantee for more restricted classes of constraints [9, 17, 18], or by extending the guarantee to more general objectives (such as maximizing a submodular function) [7, 9, 10, 11, 16, 17, 20, 23]. Unfortunately, many of the above mentioned improvements are based on quite slow algorithms. Moreover, as modern applications require the processing of increasingly large amounts of data, even the simple greedy algorithm is often viewed these days as too slow for practical use. This state of affairs has motivated recent works aiming to study the problem of finding a maximum weight set subject to a $k$ -set system constraint in a Big Data oriented setting such as Map-Reduce and the data stream model. For the Map-Reduce setting, Ponte Barbosa et al. [6] essentially solved this problem by presenting a $(k+O(\varepsilon))$ -approximation Map-Reduce algorithm for it using $O(1/\varepsilon)$ rounds, which almost matches the optimal approximation ratio in the sequential setting. In contrast, the situation for the data stream model is currently much more involved.

The only non-trivial data stream algorithm known to date (as far as we know) for finding a maximum weight set subject to a general $k$ -set system constraint is a $k^{2}(1+\varepsilon)$ -approximation semi-streaming algorithm by Crouch and Stubbs [5]. As one can observe, there is a large gap between the last approximation ratio and the $k$ -approximation that can be achieved in the offline setting. Several works partially addressed this gap by providing an $O(k)$ -approximation semi-streaming algorithms for more restricted classes of constraints, the most general of which is known as $k$ -matchoid constraints [3, 4, 8, 21]. However, these results cannot be considered a satisfactory solution for the gap because $k$ -matchoid constraints are much less general than $k$ -set system constraints.222We do not formally define $k$ -matchoid constraints in this paper, but it should be noted that they usually fail to capture knapsack like constraints. For example, a single knapsack constraint in which the ratio between the largest and smallest item sizes is at most $k$ is a $k$ -set system constraint, but usually not a $k$ -matchoid constraint.

In this paper we make a large step towards resolving the above gap. Specifically, we present an $\tilde{O}(k)$ -approximation semi-streaming algorithm for finding a maximum weight set subject to a class of constraints, known as $k$ -extendible constraints, that was introduced by [19] and captures (to the best of our knowledge) all the special cases of $k$ -set system constraints studied in the literature to date (including, in particular, $k$ -matchoid constraints). Formally, we prove the following theorem.

Theorem 1.1.

There is a polynomial time semi-streaming algorithm achieving $O(k\log k)$ -approximation for the problem of finding a maximum weight set subject to a $k$ -extendible constraint. Assuming it takes constant space to store a single element and a single weight, the space complexity of the algorithm is $O(\rho(\log k+\log\rho))$ , where $\rho$ is the maximum size of a feasible set according to the constraint.

As the class of $k$ -extendible constraints captures every other restricted class of $k$ -set system constraints from the literature, we believe Theorem 1.1 represents the final intermediate step before closing the above mentioned gap completely (i.e., either finding an $\tilde{O}(k)$ semi-streaming algorithm for $k$ -set system constraints, or proving that this cannot be done). It should also be mentioned that the approximation ratio guaranteed by Theorem 1.1 is optimal up to an $O(\log k)$ factor since it is known that one cannot achieve better than $k$ -approximation for finding a maximum weight set subject to a $k$ -extendible constraint even in the offline setting [7].

1.1 Additional Related Work

In the $k$ -dimensional matching problem, one is given a weighted hypergraph in which the vertices are partitioned into $k$ subsets, and every edge contains exactly one vertex from each one of these subsets. The objective in this problem is to find a maximum weight matching in the hypergraph. Hazan et al. [12] showed that no algorithm can achieve a better than $\Omega(k/\log k)$ -approximation for $k$ -dimension matching unless $\mathtt{P}=\mathtt{NP}$ . Interestingly, it turns out that $k$ -dimensional matching is captured by all the standard restricted cases of the the problem of finding a maximum weight set subject to $k$ -set system constraint, and thus, the inapproximability of Hazan et al. [12] extends to them as well. For most of these restricted cases this is the strongest inapproximability known, although a tight inapproximability of $k$ was proved for $k$ -set system and $k$ -extendible constraints by [1] and [7], respectively.

Complementing the hardness result of [12], some works presented algorithmic results for either $k$ -dimensional matching or natural generalizations of it such as $k$ -set packing [2, 13, 22].

2 Preliminaries and Notation

In this section we formally define some of the terms used in Section 1 and the notation that we use in the rest of this paper. Given a ground set $\cN$ , an independence system over this ground set is a pair $(\cN,\cI)$ in which $\cI$ is a non-empty collection of subsets of $\cN$ (formally, $\varnothing\neq\cI\subseteq 2^{\cN}$ ) which is down-closed (i.e., if $T$ is a set in $\cI$ and $S$ is a subset of $T$ , then $S$ also belongs to $\cI$ ). One easy way to get an example of an independence system is to take an arbitrary vector space $W$ , designate the set of vectors in this space as the ground set $\cN$ , and make $\cI$ the collection of all independent sets of vectors in $W$ . Since removing a vector from an independent set of vectors cannot make the set dependent, the pair $(\cN,\cI)$ obtained from $W$ in this way is indeed an independence system.

The above example for getting an independence system from a vector space was one of the original motivations for the study of independence systems, and thus, a lot of the terminology used for independence systems is borrowed from the world of vector spaces. In particular, a set is called independent in a given independence system $(\cN,\cI)$ if and only if it belongs to $\cI$ , and it is called a base of the independence system if it is an inclusion-wise maximal independent set. Using this terminology, we can now define $k$ -set systems.

Definition 2.1.

An independence system $(\cN,\cI)$ is a $k$ -set system for an integer $k\geq 1$ if for every set $S\subseteq\cN$ , all the bases of $(S,2^{S}\cap\cI)$ have the same size up to a factor of $k$ (in other words, the ratio between the sizes of the largest and smallest bases of $(S,2^{S}\cap\cI)$ is at most $k$ ).

An immediate consequence of the definition of $k$ -set systems is that any base of such a system is a maximum size independent set up to an approximation ratio of $k$ . Thus, one can get a $k$ -approximation for the problem of finding a maximum size independent set in a given $k$ -set system $(\cN,\cI)$ by outputting an arbitrary base of the $k$ -set system, which can be done using the following simple strategy, which we call the unweighted greedy algorithm. Start with the empty solution, and consider the elements of the ground set $\cN$ in an arbitrary order. When considering an element, add it to the current solution, unless this will make the solution dependent (i.e., not independent).

A $k$ -set system constraint is a constraint defined by a $k$ -set system, and a set $S$ obeys this constraint if and only if it is independent in that $k$ -set system. Note that using this notion we can refer to the problem studied in the previous paragraph as finding a maximum cardinality set subject to a $k$ -set system constraint. More generally, given a weight function $w\colon\cN\to{\bR_{\geq 0}}$ and a $k$ -set system $(\cN,\cI)$ over the same ground set, it is often useful to consider the problem of finding a maximum weight set $S\subseteq\cN$ subject to the constraint corresponding to this $k$ -set system (the weight of a set $S$ is defined as $\sum_{u\in S}w(u)$ ). Jenkyns [14] and Korte and Hausmann [15] showed that one can get a $k$ -approximation for this problem using an algorithm, known simply as the greedy algorithm, which is a variant of the unweighted greedy algorithm that considers the elements of $\cN$ in a non-decreasing weight order.

The definition of $k$ -set systems is very general, which occasionally does not allow them to capture all the necessary structure of a given application. Thus, various stronger kinds of independent set systems have been considered over the years, the most well known of which is the intersection of $k$ matroids (which is equivalent to a $k$ -set system for $k=1$ , and represents a strictly smaller class of independence systems for larger values of $k$ ). In this work we consider another kind of independence systems, which was originally defined by [19]. In this definition we use the expression $S+u$ to denote the union $S\cup\{u\}$ . We use the plus sign in a similar way throughout the rest of this paper.

Definition 2.2.

An independence system $(\cN,\cI)$ is a $k$ -extendible system for an integer $k\geq 1$ if for any two sets $S\subseteq T\subseteq\cN$ and an element $u\not\in T$ such that $S+u\in\cI$ , there is a subset $Y\subseteq T\setminus S$ of size at most $k$ such that $T\setminus Y+u\in\cI$ .

The class of $k$ -extendible systems is general enough to capture the intersection of $k$ matroids and every other restricted class of $k$ -set systems from the literature that we are aware of. In contrast, it is not difficult to verify that any $k$ -extendible system is a $k$ -set system. Thus, the greedy algorithm provides $k$ -approximation for the problem of finding a maximum weight set subject to a $k$ -extendible constraint—i.e., a constraint defined by a $k$ -extendible system and allowing only sets that are independent in this system.

In the data stream model version of the above problem, the elements of the ground set of a $k$ -extendible system $(\cN,\cI)$ arrive one after the other in an adversarially chosen order. An algorithm for this model views the elements of $\cN$ as they arrive, and it gets to know the weight $w(u)$ of every element $u$ upon its arrival. Additionally, as is standard in the field, we assume the algorithm has access to an independence oracle that given a set $S\subseteq\cN$ answers whether $S$ is independent. The objective of the algorithm is to output a maximum weight independent set of the $k$ -extendible system. If the algorithm is allowed enough memory to store the entire input, then the data stream model version becomes equivalent to the offline version of the problem. Thus, an algorithm for this model is interesting only if it has a low space complexity. Since any algorithm for this model must use at least the space necessary for storing its output, most works on this model look for semi-streaming algorithms, which are data stream algorithms whose space complexity is upper bounded by $O(\rho\cdot\operatorname{polylog}n)$ —where $\rho$ is the maximum size of an independent set and $n$ is the size of the ground set. In particular, we note that the space complexity guaranteed by Theorem 1.1 falls within this regime because $\rho\leq n$ by definition, and one can assume that $k\leq n$ because any independence system is $n$ -extendible.

One can observe that the unweighted greedy algorithm (unlike the greedy algorithm itself) can be implemented as a semi-streaming algorithm because it considers the elements in an arbitrary order. This observation is crucial for our result since the algorithm we develop is heavily based on using the unweighted greedy algorithm as a subroutine (a similar use of the unweighted greedy algorithm is done by the current state-of-the-art algorithm for the problem due to Crouch and Stubbs [5]).

Paper Organization:

In Section 3 we present a reduction that allows us to assume that the weights of the elements are powers of $k$ , at the cost of losing a factor of $O(\log k)$ in the space complexity of the algorithm. Using this reduction, we present a basic version of our algorithm in Section 4. This basic version presents our main new ideas, but achieves semi-streaming space complexity only under the simplifying assumption that the ratio between the maximum and minimum element weights is polynomially bounded. This simplifying assumption can be dropped using standard techniques, and we defer the details to Appendix A.

3 Reduction to $k$ -Power Weights

In this section we present a reduction that allows us to assume that the weights of all the elements in the ground set $\cN$ are powers of $k$ . This reduction simplifies the algorithms we present later in this paper. However, before presenting the reduction itself, let us note that we assume from this point on that $k=2^{i}$ for some integer $i\geq 1$ . This assumption is without loss of generality because if $k$ does not obey it, then we can increase its value to the nearest integer that does obey it. Since the new value of $k$ is larger than the old value by at most a factor of $2$ , the approximation ratio guaranteed for both values of $k$ by Theorem 1.1 is asymptotically equal.

We say that an instance of the problem of finding a maximum weight set subject to a $k$ -extendible constraint is a $k$ -power instance if the weights of all the elements in it are powers of $k$ .

Reduction 3.1.

Assume that we are given a polynomial time data stream algorithm $ALG$ for the problem of finding a maximum weight set subject to a $k$ -extendible constraint. If $ALG$ provides $\alpha$ -approximation for $k$ -power instances of the problem using $S_{ALG}$ space, then there exists a polynomial time data stream algorithm for the same problem which achieves $O(\alpha\log{k})$ -approximation for arbitrary instances using $O(S_{ALG}\cdot\log k)$ space. Moreover, if the weights of all the elements fall within some range $[{w_{\min}},{w_{\max}}]$ , then it suffices for $ALG$ to provide $\alpha$ -approximation for $k$ -power instances in which all the weights fall within the range $[{w_{\min}}/k,{w_{\max}}]$ .

Before presenting the algorithm we use to prove the above reduction, we need to define some additional notation. Let $\ell\triangleq\log_{2}k$ , and note that $\ell$ is a positive integer because we assume that $k$ is at least $2$ and a power of $2$ . For every element $u\in\cN$ of weight $w(u)$ , we define an auxiliary weight $w_{2}(u)\triangleq k^{\lfloor\log_{k}w(u)\rfloor}$ . Intuitively, $w_{2}(u)$ is the highest power of $k$ which is not larger than $w(u)$ . The following observation formally states the properties of $w_{2}$ that we need. In this observation we use the notation $i(u)\triangleq\lfloor\log_{2}w(u)\rfloor$ .

Observation 3.2.

For every element $u\in\cN$ , $w_{2}(u)$ is a power of $k$ obeying $w(u)/2\leq w_{2}(u)\cdot 2^{i(u)\bmod\ell}\leq w(u)$ and $w(u)/k\leq w_{2}(u)\leq w(u)$ .

Proof.

The first part of the observation, namely that $w_{2}(u)$ is a power of $k$ , follows immediately from the definition of $w_{2}$ . Thus, we concentrate here on proving the other parts of the observation.

Note that

[TABLE]

Rearranging the last equality, we get

[TABLE]

and

[TABLE]

To complete the proof of the observation, we note that it also holds that

[TABLE]

We are now ready to present the algorithm that we use to prove Reduction 3.1, which appears as Algorithm 1. To intuitively understand this algorithm, it is useful to think of $i(u)$ as the “class” element $u$ belongs to. All the elements within class $i$ have weights between $2^{i}$ and $2^{i+1}$ , and thus, treating them all as having the weight $2^{i}$ does not affect the approximation ratio by more than a factor of $2$ . Let us call $2^{i}$ the characteristic weight of class $i$ . Note now that the ratio between the characteristic weight of class $i_{1}$ and the characteristic weight of class $i_{2}$ is $2^{i_{1}-i_{2}}$ , which is a power of $k$ whenever $i_{1}-i_{2}$ is an integer multiple of $\ell=\log_{2}k$ . Thus, one can group the classes into $\ell$ groups such that the ratio between the characteristic weights of any pair of classes within a group is a power of $k$ (see Figure 1 for a graphial illustration of these groups). Moreover, by multiplying all the characteristic weights in the group by an appropriate scaling factor, one can make them all powers of $k$ . This means that for every group there exists a transformation that converts all the weights of the elements in it to powers of $k$ and preserves the ratio between any two weights in the group up to a factor of $2$ . In particular, we get that the elements of the group after the transformation form a $k$ -power instance.

Adding up all the above, we have described a way to transform any instance of finding a maximum weight independent set subject to a $k$ -extendible constraint into $\ell$ new instances of this problem that are guaranteed to be $k$ -power. Algorithm 1 essentially creates these $\ell$ new instances on the fly, and feeds them to $\ell$ copies of the algorithm $ALG$ whose existence is assumed in Reduction 3.1. Given this point of view, $i(u)\bmod\ell$ should be understood as the group to which element $u$ belongs, and $w_{2}(u)$ is the transformed weight of $u$ . Observation 3.2 can now be interpreted as stating that the ratio between the weights of elements belonging to the same group (and thus, having the same $i(u)\bmod\ell$ value) is indeed changed by the transformation by at most a factor of $2$ .

In the rest of this section, we use $B_{i}$ to denote the set of elements fed to instance $ALG_{i}$ by Algorithm 1, and $T$ to denote the output of Algorithm 1. Additionally, we denote by $OPT$ an arbitrary (fixed) optimal solution for the original instance recieved by Algorithm 1. The following lemma proves that Algorithm 1 has the approximation ratio guaranteed by Reduction 3.1.

Lemma 3.3.

$w(OPT)\leq O(\alpha\log k)\cdot w(T)$ .

Proof.

Since Algorithm 1 feeds every arriving element into exactly one of the instances $ALG_{0},\allowbreak ALG_{1},\dotsc,ALG_{\ell-1}$ , the sets $B_{0},B_{1},\dotsc,B_{\ell-1}$ form a disjoint partition of $\cN$ . Thus,

[TABLE]

Hence, by an averaging argument, there must exist an index $i$ such that $w(OPT)\leq\ell\cdot w(OPT\cap B_{i})$ .

We now note that it follows from the pseudocode of Algorithm 1 and Observation 3.2 that the copies of $ALG$ get only weights that are powers of $k$ , and moreover, these weights belong to the range $[{w_{\min}}/k,{w_{\max}}]$ whenever the original weights received by Algorithm 1 belong to the range $[{w_{\min}},{w_{\max}}]$ . Thus, by the assumption of Reduction 3.1, $ALG_{i}$ achieves $\alpha$ -approximation for the instance it faces. Since $B_{i}\cap OPT$ is a feasible solution within this instance and $C_{i}$ is the output of $ALG_{i}$ , we get $w_{2}(OPT\cap B_{i})\leq\alpha\cdot w_{2}(C_{i})$ . Therefore,

[TABLE]

where the second and penultimate inequalities hold by Observation 3.2, and the last inequality is due to the fact that $T$ is the best solution among $C_{0},C_{1},\dotsc,C_{\ell-1}$ . ∎

The next lemma analyzes the space complexity of Algorithm 1 and completes the proof of Reduction 3.1.

Lemma 3.4.

Algorithm 1’s space complexity is $O(S_{ALG}\cdot\log{k})$ .

Proof.

Algorithm 1 runs $\log{k}$ parallel copies of $ALG$ , each of them is assumed (by Reduction 3.1) to use $S_{ALG}$ space. Thus, the space required by these $\log k$ copies is $O(S_{ALG}\cdot\log k)$ . In addition to this space, Algorithm 1 only requires enough space to do two things.

•

The algorithm has to store the outputs of the copies of $ALG$ . However, these outputs are originally stored by the copies themselves, and thus, storing them requires no more space than what is used by the copies.

•

Calculate the sum of the weights of the elements in the solutions produced by the copies of $ALG$ . Since we assume that the weight of an element can be stored in constant space, this requires again (up to constant factors) no more space than the space used by the copies of $ALG$ to store their solutions. ∎

4 Algorithm

In this section we present a data stream algorithm for $k$ -power instances of the problem of finding a maximum weight set subject to a $k$ -extendible constraint. This algorithm assumes access to positive upper bound ${w_{\max}}$ and lower bound ${w_{\min}}$ on the weights of all the elements, and has a semi-streaming space complexity when the ratio between ${w_{\max}}$ and ${w_{\min}}$ is upper bounded by a polynomial in $n$ . Proposition 4.1 states the properties that we prove for this algorithm more formally.

Proposition 4.1.

There exists a $2k$ -approximation data stream algorithm for $k$ -power instances of the problem of finding a maximum weight set subject to a $k$ -extendible constraint. This algorithm assumes access to positive upper bound ${w_{\max}}$ and lower bound ${w_{\min}}$ on the weights of all the elements, and its space complexity is $O(\rho(\log(\nicefrac{{{w_{\max}}}}{{{w_{\min}}}})/\log k+1))$ under the assumption that constant space suffices to store a single element and a single weight.

Before getting to the proof of Proposition 4.1, we note that together with Reduction 3.1 this proposition immediately implies the following corollary.

Corollary 4.2.

There exists an $O(k\log k)$ -approximation data streaming algorithm for the problem of finding a maximum weight set subject to a $k$ -extendible constraint. This algorithm assumes access to positive upper bound ${w_{\max}}$ and lower bound ${w_{\min}}$ on the weights of all the elements. The space complexity of this algorithm is $O(\rho(\log(\nicefrac{{{w_{\max}}}}{{{w_{\min}}}})+\log k))$ under the assumption that constant space suffices to store a single element and a single weight.

Note that when the ratio between ${w_{\max}}$ and ${w_{\min}}$ is polynomial in $n$ , the space complexity of the algorithm from Corollary 4.2 becomes $O(\rho\log n)$ , and thus, the algorithm is semi-streaming. In Appendix A we explain how the algorithm can be modified so that it keeps the “effective” ratio $\nicefrac{{{w_{\max}}}}{{{w_{\min}}}}$ on the order of $O(k^{2}\rho^{2})$ even when no values ${w_{\max}}$ and ${w_{\min}}$ are supplied to the algorithm and the weights of the elements come from an arbitrary range. This leads to the space complexity of $O(\rho(\log k+\log\rho))$ stated in Theorem 1.1.

The rest of this section is devoted to the proof of Proposition 4.1. As a first step towards this goal, let us recall that the unweighted greedy algorithm is an algorithm that considers the elements of the ground set $\cN$ in an arbitrary order, and adds every considered element to the solution it constructs if that does not violate independence. As mentioned above, it follows immediately from the definition of $k$ -set systems that the unweighted greedy algorithm achieves an approximation ratio of $k$ for the problem of finding a maximum cardinality independent set subject to a $k$ -set system constraint. Since $k$ -set systems generalize $k$ -extendible systems, the same is true also for $k$ -extendible constraints. The following lemma improves over this by showing a tighter guarantee for $k$ -extendible constraints.

Lemma 4.3.

Given a $k$ -extendible set system $(\cN,\cI)$ , the unweighted greedy algorithm is guaranteed to produce an independent set $B$ such that $k\cdot|B\setminus A|\geq|A\setminus B|$ for any independent set $A\in\cI$ .

Proof.

Let us denote the elements of $B\setminus A$ by $x_{1},x_{2},\dotsc,x_{m}$ in an arbitrary order. Using these elements, we recursively define a series of independent sets $A_{0},A_{1},\dotsc,A_{m}$ . The set $A_{0}$ is simply the set $A$ . For $1\leq i\leq m$ , we define $A_{i}$ using $A_{i-1}$ as follows. Since $(\cN,\cI)$ is a $k$ -extendible system and the subsets $A_{i-1}$ and $A_{i-1}\cap B+x_{i}\subseteq B$ are both independent, there must exist a subset $Y_{i}\subseteq A_{i-1}\setminus(A_{i-1}\cap B)=A_{i-1}\setminus B$ such that $|Y_{i}|\leq k$ and $A_{i-1}\setminus Y_{i}+x_{i}\in\mathcal{I}$ . Using the subset $Y_{i}$ , we now define $A_{i}=A_{i-1}\setminus Y_{i}+x_{i}$ . Note that by the definition of $Y_{i}$ , $A_{i}\in\mathcal{I}$ as promised. Furthermore, since $Y_{i}\cap B=\varnothing$ for each $0\leq i\leq m$ , we know that $(A\cup\{x_{1},x_{2},\dots,x_{m}\})\cap B\subseteq A_{m}$ , which implies $B\subseteq A_{m}$ because $\{x_{1},x_{2},\dotsc,x_{m}\}=B\setminus A$ . However, $B$ , as the output of the unweighted greedy algorithm, must be inclusion-wise maximal independent set (i.e., a base), and thus, it must be in fact equal to the independent set $A_{m}$ containing it.

Let us now denote $Y=\bigcup_{i=1}^{m}Y_{i}$ , and consider two different ways to bound the number of elements in $Y$ . On the one hand, since every set $Y_{i}$ includes up to $k$ elements, we get $|Y|\leq km=k\cdot|B\setminus A|$ . On the other hand, the fact that $B=A_{m}$ implies that every element of $A\setminus B$ belongs to $Y_{i}$ for some value of $i$ , and therefore, $|Y|\geq|A\setminus B|$ . The lemma now follows by combining these two bounds. ∎

We are now ready to present the algorithm we use to prove Proposition 4.1, which is given as Algorithm 2. This algorithm has two main stages. In the first stage, the algorithm runs an independent copy of the unweighted greedy algorithm for every possible weight of elements. The copy corresponding to the weight $k^{i}$ is denoted by ${\texttt{Greedy}}_{i}$ in the pseudocode of the algorithm, and Algorithm 2 feeds to it only the input elements whose weight is at least $k^{i}$ . The output of ${\texttt{Greedy}}_{i}$ is denoted by $C_{i}$ in the algorithm. We also denote in the analysis by $E_{i}$ the set of elements fed to ${\texttt{Greedy}}_{i}$ . By definition, $C_{i}$ is obtained by running the unweighted greedy algorithm on the elements of $E_{i}$ , which is a property we use below.

In the second stage of Algorithm 2 (which is done as a post-processing after the stream has ended), the algorithm constructs an output set $T$ based on the outputs of the copies of the unweighted greedy algorithm. Specifically, this is done by running the unweighted greedy algorithm on the elements of $\bigcup_{i={i_{\min}}}^{{i_{\max}}}C_{i}$ , considering the elements of the sets $C_{i}$ in a decreasing value of $i$ order. While doing so, the given pseudocode also keeps in $T_{i}$ the temporary solution obtained by the unweighted greedy algorithm after considering only the elements of $C_{j}$ for $j\geq i$ . This temporary solution is used by the analysis below, but need not be kept by a real implementation of Algorithm 2.

We begin the analysis of Algorithm 2 by analyzing its space complexity.

Lemma 4.4.

Algorithm 2 can be implemented using a space complexity of $O(\rho(\log(\nicefrac{{{w_{\max}}}}{{{w_{\min}}}})/\log k\allowbreak+1))$ .

Proof.

Note that each copy of the unweighted greedy algorithm only has to store its solution, which contains up to $\rho$ elements since it is independent. Algorithm 2 uses ${i_{\max}}-{i_{\min}}+1$ such copies, and thus, the space it needs for these copies is only

[TABLE]

In addition to the space used by the copies of the unweighted greedy algorithm, Algorithm 2 only needs to store the set $T$ . This set contains a subset of the elements from the outputs of the above copies, and thus, can increases the space required only by a constant factor. ∎

To complete the proof of Proposition 4.1, it remains to analyze the approximation ratio of Algorithm 2. We begin with the following lemma, which is the technical heart of our analysis. Like in Section 3, let us denote by $OPT$ be an arbitrary (fixed) optimal solution to the problem we want to solve. We also assume for consistency that $T_{{i_{\max}}+1}=\varnothing$ (note that $T_{{i_{\max}}+1}$ is not defined by Algorithm 2).

Lemma 4.5.

For each integer ${i_{\min}}\leq i\leq{i_{\max}}$ , $k^{2}\cdot|T_{i+1}|+k\cdot|T_{i}\setminus T_{i+1}|\geq|OPT\cap E_{i}|$ .

Proof.

The set $T_{i}$ can be viewed as the output of the unweighted greedy algorithm running on $\bigcup_{i\leq j\leq{i_{\max}}}C_{j}$ . Since we also know that $C_{i}$ is independent, Lemma 4.3 guarantees

[TABLE]

Adding $k\cdot|C_{i}\cap T_{i}|$ to both its sides, we get

[TABLE]

where the last inequality holds since the unweighted greedy algorithm achieves $k$ -approximation and $OPT\cap E_{i}$ is an independent set within $E_{i}$ (recall that $E_{i}$ is the set of elements that were fed to ${\texttt{Greedy}}_{i}$ ). Using the last inequality we can now get

[TABLE]

where the first equality holds because $T_{i+1}\subseteq T_{i}$ , and the second inequality holds because $T_{i}\setminus T_{i+1}\subseteq C_{i}\cap T_{i}$ (recall that the algorithm constructs $T_{i}$ by adding elements of $C_{i}$ to $T_{i+1}$ ). The lemma now follows by rearranging the above inequality and multiplying it by $k$ . ∎

Using the last lemma, we can prove the existence of a useful mapping from the elements of $OPT$ to the elements of $T$ .

Lemma 4.6.

There exists a mapping $f\colon OPT\to T$ such that

for each $t\in T$ , $|f^{-1}(t)|\leq k^{2}$ . 2. 2.

for each $t\in T$ , $|\{u\in f^{-1}(t)\mid w(u)=w(t)\}|\leq k$ . 3. 3.

for each $u\in OPT$ , $w(u)\leq w(f(u))$ .

Proof.

We construct $f$ by scanning the elements $OPT$ and defining the mapping $f(e)$ for every element $e$ scanned. To describe the order in which we scan the elements of $OPT$ , let us define $P_{i}=OPT\cap(E_{i}\setminus E_{i-1})$ . Note that $P_{i_{\min}},P_{i_{\min}+1},\dotsc,P_{i_{\max}}$ is a disjoint partition of $OPT$ , and thus, any scan of the elements of $P_{i_{\min}},P_{i_{\min}+1},\dotsc,P_{i_{\max}}$ is a scan of the elements of $OPT$ . Specifically, we scan the elements of $OPT$ by first scanning the elements of $P_{i_{\max}}$ in an arbitrary order, then scanning the elements of $P_{i_{\max}-1}$ in an arbitrary order, and so on. Consider now the situation when our scan gets to an arbitrary element $u$ of set $P_{i}$ . One can note that prior to scanning $u$ , we scanned (and mapped) only elements of $P_{i}\cup P_{i+1}\cup\dotsb\cup P_{i_{\max}}=OPT\cap E_{i}$ , and thus, we mapped at most $|OPT\cap E_{i}|-1$ elements (the $-1$ is due to the fact that $u\in OPT\cap E_{i}$ , and $u$ was not mapped yet). Combining this with Lemma 4.5, we get that at the point in which we scan $u$ there must still be either an element $t\in T_{i+1}$ that still has less than $k^{2}$ elements mapped to it or an element $t\in T_{i}\setminus T_{i+1}$ that still has less than $k$ elements mapped to it. We choose the mapping $f(u)$ of $u$ to be an arbitrary such element $t$ .

Property 1 of the lemma is clearly satisfied by the above construction because we never map an element $u$ to an element $t$ that already has $k^{2}$ elements mapped to it. To see why Property 3 of the lemma also holds, note that every element $u\in P_{i}$ must have a weight of $k^{i}$ by the definition of $P_{i}$ . This element is mapped by $f$ to some element $t\in T_{i+1}\cup(T_{i}\setminus T_{i+1})=T_{i}\subseteq E_{i}$ , and the weight of $t$ is at least $k^{i}=w(u)$ by the definition of $E_{i}$ . It remains to prove Property 2 of the lemma. Consider an arbitrary element $t\in T$ of weight $k^{i}$ . The elements of $OPT$ whose weight is $k^{i}$ are exactly the elements of $P_{i}$ , and thus, we need to show that $|f^{-1}(t)\cap P_{i}|\leq k$ . Since all the elements of $T_{i+1}\subseteq C_{i+1}\cup C_{i+2}\cup\dotsb\cup C_{i_{\max}}\subseteq E_{i+1}$ have weights of at least $k^{i+1}$ , $t$ cannot belong to $T_{i+1}$ . Thus, an element of $P_{i}$ can be mapped to $t$ when scanned only if $t$ has less than $k$ elements already mapped to it (if $t\in T_{i}$ ) or not at all (if $t\not\in T_{i}$ ), which implies that no more than $k$ elements of $P_{i}$ can get mapped to $t$ , which is exactly what we wanted to prove. ∎

We are now ready to prove the approximation ratio of Algorithm 2 (and complete the proof of Proposition 4.1).

Lemma 4.7.

Algorithm 2 is a $2k$ -approximation algorithm for $k$ -power instances of the problem of finding a maximum weight set subject to a $k$ -extendible constraint.

Proof.

Let $f$ be the function whose existence is guaranteed by Lemma 4.6. The properties of this function imply that, for each element $t\in T$ ,

[TABLE]

Thus,

[TABLE]

which completes the proof of the lemma. ∎

5 Conclusion

In this work we have presented the first semi-streaming $\tilde{O}(k)$ -approximation algorithm for the problem of finding a maximum weight set subject to a $k$ -extendible constraint. This result is intrinsically interesting because the generality of $k$ -extendible constraints makes our algorithm applicable to many problems of interest. Additionally, we believe (as discussed in Section 1) that our result is likely to be the final intermediate step towards the goal of designing an algorithm with similar properties for general $k$ -set system constraints or proving that this cannot be done.

Given our work, the immediate open question is to settle the approximation ratio that can be obtained for $k$ -set system constraints in the data stream model. Another interesting research direction is to find out whether one can improve over the approximation ratio of our algorithm. Specifically, we leave open the question of whether there is a semi-streaming algorithm for finding a maximum weight set subject to a $k$ -extendible constraint whose approximation ratio is clean $O(k)$ .

Appendix A Algorithm for General Weights

In this section we present a semi-streaming algorithm for $k$ -power instances of the problem of finding a maximum weight set subject to a $k$ -extendible constraint. Unlike Algorithm 2, this algorithm does not assume access to the bounds ${w_{\max}}$ and ${w_{\min}}$ , and its space complexity remains nearly linear regardless of the ratio between these bounds. A more formal statement of the properties of this algorithm is given in Proposition 4.1. Note that, together with Reduction 3.1, this proposition immediately implies Theorem 1.1.

Proposition A.1.

There exists a $4k$ -approximation semi-streaming algorithm for $k$ -power instances of the problem of finding a maximum weight set subject to a $k$ -extendible constraint. The space complexity of this algorithm is $O(\rho(\log k+\log\rho)/\log k)$ under the assumption that constant space suffices to store a single element and a single weight.

Throughout this section we assume for simplicity that the $k$ -extendible system does not include any self-loops (a self-loop is an element $u\in\cN$ such that $\{u\}$ is a dependent set—i.e., $\{u\}\not\in\cI$ ). This assumption is without loss of generality since a self-loop cannot belong to any independent set, and thus, an algorithm can safely ignore self-loops if they happen to exist. One consequence of this assumption is that $\max_{u\in\cN}w(u)\leq w(OPT)$ , where $OPT$ is an arbitrary fixed optimal solution like in the previous sections. This inequality holds since $\{u\}$ is a feasible solution for every element $u\in\cN$ , and therefore, its weight cannot exceed the weight of $OPT$ .

As mentioned in Section 4, the algorithm we use to prove Proposition A.1 is a variant of Algorithm 2 that includes additional logic designed to force the ratio $\nicefrac{{{w_{\max}}}}{{{w_{\min}}}}$ to be effectively polynomial—specifically, $O(k^{2}\rho^{2})$ . Given access to $\rho$ and $\max_{u\in\cN}w(u)$ , this could be done simply by settings ${w_{\max}}=\max_{u\in\cN}w(u)$ and ${w_{\min}}=\max_{u\in\cN}w(u)/(2\rho)$ and discarding any element whose weight is lower then ${w_{\min}}$ .333Starting from this point, ${w_{\max}}$ and ${w_{\min}}$ are no longer necessarily upper and lower bounds on the weights of all the elements. However, they remain upper and lower bounds on the weights of the non-discarded elements. This guarantees that the ratio $\nicefrac{{{w_{\max}}}}{{{w_{\min}}}}$ is small, and affects the weight of the optimal solution $OPT$ by at most a constant factor since the total weight of the elements of this solution that get discarded is upper bounded by

[TABLE]

Unfortunately, our algorithm does not have access (from the beginning) to $\rho$ and $\max_{u\in\cN}w(u)$ . As an alternative, this algorithm, which is given as Algorithm 3, does two things. First, it keeps ${w_{\max}}$ equal to the maximum weight of the elements seen so far, which guarantees that eventually ${w_{\max}}$ becomes $\max_{u\in\cN}w(u)$ . Second, it runs the unweighted greedy algorithm on the input it receives. The size of the solution maintained by the unweighted greedy algorithm, which we denoted by $g$ , provides an estimate for the maximum size of an independent set consisting only of elements that have already arrived. In particular, after all the elements arrive, $\rho/k\leq g\leq\rho$ because the unweighted greedy algorithm is a $k$ -approximation algorithm.

Given the above discussion and the fact that the final value of $kg$ is an upper bound on $\rho$ , it is natural to define ${w_{\min}}$ as ${w_{\max}}/(2kg)$ and discard every element whose weight is lower than ${w_{\min}}$ . Unfortunately, this does not work since ${w_{\max}}$ and $g$ change during the execution of Algorithm 3, and reach their final values only when it terminates. Thus, we need to set ${w_{\min}}$ to a more conservative (lower) value. In particular, Algorithm 3 uses ${w_{\min}}={w_{\max}}/(2gk)^{2}$ .

Like Algorithm 2, Algorithm 3 maintains an instance of the unweighted greedy algorithm for every possible weight between ${w_{\min}}$ and ${w_{\max}}$ . However, doing so is somewhat more involved for Algorithm 3 because ${w_{\min}}$ and ${w_{\max}}$ change during the algorithm’s execution, which requires the algorithm to occasionally create and remove instances of unweighted greedy. The creation of such instances involves one subtle issue that needs to be kept in mind. In Algorithm 2 every instance of unweighted greedy associated with a weight $w$ receives all elements whose weight is at least $w$ . To mimic this behavior, when Algorithm 3 creates new instances of unweighted greedy following a decrease in ${w_{\min}}$ (which can happen when $g$ increases), the newly created instances are not fresh new instances but copies of the instance of unweighted greedy that was previously associated with the lowest weight.

The rest of the details of Algorithm 3 are identical to the details of Algorithm 2. Specifically, every arriving element $u$ is feed to every instance of unweighted greedy associated with a weight of $w(u)$ or less, and at termination the outputs of all the unweighted greedy instances are combined in the same way in which this is done in Algorithm 2.

We now get to the analysis of Algorithm 3, and let us begin by bounding its space complexity. Let $g(h)$ , ${i_{\min}}(h)$ , ${i_{\max}}(h)$ , ${w_{\min}}(h)$ and ${w_{\max}}(h)$ denote the values of $g$ , ${i_{\min}}$ , ${i_{\max}}$ , ${w_{\min}}$ and ${w_{\max}}$ , respectively, at the end of iteration number $h$ of Algorithm 3.

Lemma A.2.

Algorithm 3 can be implemented using a space complexity of $O(\rho(\log k+\log\rho)/\log k)$ .

Proof.

Using the same argument used in the proof of Lemma 4.4, it can be shown that the space complexity of Algorithm 3 is upper bounded by $O(\rho)$ times the maximum number of unweighted greedy instances maintained by the algorithm at the same time. By making the deletions of unweighted greedy instances precede the creation of new instances within every given iteration of the main loop of Algorithm 3 (and avoiding the creation of instances that need to be immediately deleted), it can be guaranteed that the maximum number of instances of unweighted greedy maintained by Algorithm 3 at any given time is exactly $\max_{1\leq h\leq n}\{{i_{\max}}(h)-{i_{\min}}(h)+2\}$ . Thus, the algorithm’s space complexity is at most

[TABLE]

where the second inequality is due to the fact that $g$ is always the size of an independent set, and thus, cannot exceed $\rho$ . ∎

Our next objective is to analyze the approximation ratio of Algorithm 3. Like in the toy analysis presented above for the case in which the algorithm has access to $\rho$ and $\max_{u\in\cN}w(u)$ , the analysis we present starts by upper bounding the total weight of the discarded elements. However, to do that we need the following technical observation, which can be proved by induction.

Observation A.3.

Algorithm 3 maintains the invariant that, at the end of every one of its loops, if an element $u\in\cN$ was fed to some instance of unweighted greedy currently kept by the algorithm, then it was fed exactly to those instances associated with a weight of at most $\log_{k}w(u)$ .

We say that an element $u\in\cN$ is discarded by Algorithm 3 if $u$ was never fed to the final instance ${\texttt{Greedy}}_{{i_{\min}}(n)}$ (during the execution of Algorithm 3 there might be multiple instances of unweighted greedy named ${\texttt{Greedy}}_{i}$ for $i={i_{\min}}(n)$ —by final instance we mean the last of these instances). Let $F$ be the set of discarded elements.

Lemma A.4.

$w(OPT\cap F)\leq\frac{1}{2}\cdot w(OPT)$ .

Proof.

For every $1\leq i\leq|OPT\cap F|$ , let $u_{i}$ be the $i$ -th element of $OPT\cap F$ to arrive, and let $h_{i}$ be its location in the input stream. Given Observation A.3, the fact that $u_{i}\in F$ implies that $u_{i}$ was not feed to the final instance ${\texttt{Greedy}}_{\log_{k}w(u)}$ , which can only happen if an instance named ${\texttt{Greedy}}_{\log_{k}w(u)}$ either did not exist when $u_{i}$ arrived or was deleted at some point after $u_{i}$ ’s arrival. Thus, ${i_{\min}}(h^{\prime}_{i})>\log_{k}w(u_{i})$ for some $h_{i}\leq h^{\prime}_{i}\leq n$ .

The crucial observation now is that $g(h^{\prime}_{i})\geq g(h_{i})\geq i/k$ because by the time $u_{i}$ arrives there are already $i$ elements of $OPT$ that arrived, and these elements form together an independent set of size $i$ (recall that $g$ is a $k$ -approximation for the maximum size of an independent set consisting only of elements that already arrived). Thus, we get

[TABLE]

where the first inequality holds since ${i_{\min}}(h^{\prime}_{i})>\log_{k}w(u_{i})$ and both ${i_{\min}}(h^{\prime}_{i})$ and $\log_{k}w(u_{i})$ are integers. Adding up the last inequality over $1\leq i\leq|OPT\cap F|$ yields

[TABLE]

The next lemma shows that Algorithm 3 has a good approximation ratio with respect to the non-discarded elements of $OPT$ .

Lemma A.5.

$w(OPT\setminus F)\leq 2k\cdot w(T)$ .

Proof.

Observe that $(\cN\setminus F,\cI\cap 2^{\cN\setminus F})$ is a $k$ -extendible system, derived from $(\cN,\cI)$ by removing all elements of $F$ . In addition, all the weights of the elements of this set system are powers of $k$ , and thus, by Proposition 4.1, Algorithm 2 achieves $2k$ -approximation for the problem of finding a maximum weight independent set of $(\cN\setminus F,\cI\cap 2^{\cN\setminus F})$ . In other words, when Algorithm 2 is fed only the elements of $\cN\setminus F$ , its output set $T^{\prime}$ obeys $w(OPT^{\prime})\leq 2k\cdot w(T^{\prime})$ , where $OPT^{\prime}$ is an arbitrary maximum weight set independent set of $(\cN\setminus F,\cI\cap 2^{\cN\setminus F})$ .

We now note that one consequence of Observation A.3 is that, by the time Algorithm 3 terminates, the instances ${\texttt{Greedy}}_{{i_{\min}}(n)},{\texttt{Greedy}}_{{i_{\min}}(n)+1},\dotsc,{\texttt{Greedy}}_{{i_{\max}}(n)}$ it maintains receive exactly the input received by the corresponding instances in Algorithm 2 when the last algorithm gets only the elements of $\cN\setminus F$ as input. Since Algorithms 2 and 3 compute their outputs based on the outputs of ${\texttt{Greedy}}_{{i_{\min}}(n)},{\texttt{Greedy}}_{{i_{\min}}(n)+1},\dotsc,{\texttt{Greedy}}_{{i_{\max}}(n)}$ in the same way, this implies that the output set $T$ of Algorithm 3 is identical to the output set $T^{\prime}$ produced by Algorithm 2 when this algorithm is given only the elements of $\cN\setminus F$ as input.

Combining the above observations, we get

[TABLE]

where the last inequality holds since $OPT^{\prime}$ is a maximum weight independent set in $(\cN\setminus F,\cI\cap 2^{\cN\setminus F})$ and $OPT\setminus F$ is independent in this set system. The lemma now follows by rearranging the last inequality. ∎

Corollary A.6.

$w(OPT)\leq 4k\cdot w(T)$ , and thus, the approximation ratio of Algorithm 3 is at most $4k$ .

Proof.

Combining the last two lemmata, one gets

[TABLE]

The corollary now follows by rearranging the above inequality. ∎

We conclude the section by noticing that Proposition A.1 is an immediate consequence of Lemma A.2 and Corollary A.6.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Ashwinkumar Badanidiyuru and Jan Vondrák. Fast algorithms for maximizing submodular functions. In SODA , pages 1497–1514, 2014.
2[2] Piotr Berman. A d / 2 𝑑 2 d/2 approximation for maximum weight independent set in d 𝑑 d -claw free graphs. Nord. J. Comput. , 7(3):178–184, 2000.
3[3] Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: matchings, matroids, and more. Math. Program. , 154(1-2):225–247, 2015.
4[4] Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodular function maximization. In ICALP , pages 318–330, 2015.
5[5] Michael Crouch and Daniel M. Stubbs. Improved streaming algorithms for weighted matching, via unweighted matching. In APPROX , pages 96–104, 2014.
6[6] Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In FOCS , pages 645–654, 2016.
7[7] Moran Feldman, Christopher Harshaw, and Amin Karbasi. Greed is good: Near-optimal submodular maximization via greedy optimization. In COLT , pages 758–784, 2017.
8[8] Moran Feldman, Amin Karbasi, and Ehsan Kazemi. Do less, get more: Streaming submodular maximization with subsampling. In Neur IPS 2018 , pages 730–740, 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Almost Optimal Semi-streaming

Abstract

1 Introduction

Theorem 1.1**.**

1.1 Additional Related Work

2 Preliminaries and Notation

Definition 2.1**.**

Definition 2.2**.**

Paper Organization:

3 Reduction to kkk-Power Weights

Reduction 3.1**.**

Observation 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

Lemma 3.4**.**

Proof.

4 Algorithm

Proposition 4.1**.**

Corollary 4.2**.**

Lemma 4.3**.**

Proof.

Lemma 4.4**.**

Proof.

Lemma 4.5**.**

Proof.

Lemma 4.6**.**

Proof.

Lemma 4.7**.**

Proof.

5 Conclusion

Appendix A Algorithm for General Weights

Proposition A.1**.**

Lemma A.2**.**

Proof.

Observation A.3**.**

Lemma A.4**.**

Proof.

Lemma A.5**.**

Proof.

Corollary A.6**.**

Proof.

Theorem 1.1.

Definition 2.1.

Definition 2.2.

3 Reduction to $k$ -Power Weights

Reduction 3.1.

Observation 3.2.

Lemma 3.3.

Lemma 3.4.

Proposition 4.1.

Corollary 4.2.

Lemma 4.3.

Lemma 4.4.

Lemma 4.5.

Lemma 4.6.

Lemma 4.7.

Proposition A.1.

Lemma A.2.

Observation A.3.

Lemma A.4.

Lemma A.5.

Corollary A.6.