Faster Guarantees of Evolutionary Algorithms for Maximization of   Monotone Submodular Functions

Victoria G. Crawford

arXiv:1908.01230·cs.DS·July 7, 2021

Faster Guarantees of Evolutionary Algorithms for Maximization of Monotone Submodular Functions

Victoria G. Crawford

PDF

TL;DR

This paper introduces new evolutionary algorithms with improved theoretical guarantees for maximizing monotone submodular functions under cardinality constraints, achieving near-optimal ratios with fewer function queries.

Contribution

The paper proposes a novel Pareto optimization algorithm and a biased Pareto optimization variant with improved query complexity and approximation guarantees for submodular maximization.

Findings

01

Algorithms achieve a $(1-rac{1}{e})$ approximation ratio in expectation.

02

The algorithms require fewer function evaluations compared to existing methods.

03

Empirical results support the theoretical improvements over stochastic greedy algorithms.

Abstract

In this paper, the monotone submodular maximization problem (SM) is studied. SM is to find a subset of size $κ$ from a universe of size $n$ that maximizes a monotone submodular objective function $f$ . We show using a novel analysis that the Pareto optimization algorithm achieves a worst-case ratio of $(1 - ϵ) (1 - 1/ e)$ in expectation for every cardinality constraint $κ < P$ , where $P \leq n + 1$ is an input, in $O (n P ln (1/ ϵ))$ queries of $f$ . In addition, a novel evolutionary algorithm called the biased Pareto optimization algorithm, is proposed that achieves a worst-case ratio of $(1 - ϵ) (1 - 1/ e)$ in expectation for every cardinality constraint $κ < P$ in $O (n ln (P) ln (1/ ϵ))$ queries of $f$ . Further, the biased Pareto optimization algorithm can be modified in order to achieve a worst-case ratio of $(1 - ϵ) (1 - 1/ e)$ in expectation for cardinality…

Figures6

Click any figure to enlarge with its caption.

Equations84

E [X \in S, ∣ X ∣ \leq κ max f (X)] \geq (1 - ϵ) (1 - 1/ e) ∣ X ∣ \leq κ max f (X) .

E [X \in S, ∣ X ∣ \leq κ max f (X)] \geq (1 - ϵ) (1 - 1/ e) ∣ X ∣ \leq κ max f (X) .

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

E [X \in S, ∣ X ∣ \leq κ max f (X) ∣ F] \geq (1 - \frac{1}{e}) f (A^{*})

E [X \in S, ∣ X ∣ \leq κ max f (X) ∣ F] \geq (1 - \frac{1}{e}) f (A^{*})

\frac{1}{P} x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} \geq \frac{∣ A ^{*} ∣}{e n P} .

\frac{1}{P} x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} \geq \frac{∣ A ^{*} ∣}{e n P} .

E [X \in S, ∣ X ∣ \leq κ max f (X)] \geq (1 - ϵ) (1 - 1/ e - ϵ) ∣ X ∣ \leq κ max f (X) .

E [X \in S, ∣ X ∣ \leq κ max f (X)] \geq (1 - ϵ) (1 - 1/ e - ϵ) ∣ X ∣ \leq κ max f (X) .

ξ^{q} P < ∣ A^{*} ∣ \leq ξ^{q - 1} P .

ξ^{q} P < ∣ A^{*} ∣ \leq ξ^{q - 1} P .

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1 - ϵ}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1 - ϵ}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

E [f (A) ∣ F] \geq (1 - \frac{1}{e} - ϵ) f (A^{*})

E [f (A) ∣ F] \geq (1 - \frac{1}{e} - ϵ) f (A^{*})

E [X \in S, ∣ X ∣ \leq κ max f (X)] \geq (1 - ϵ) (1 - 1/ e - ϵ) ∣ X ∣ \leq κ max f (X) .

E [X \in S, ∣ X ∣ \leq κ max f (X)] \geq (1 - ϵ) (1 - 1/ e - ϵ) ∣ X ∣ \leq κ max f (X) .

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

E [f (X_{i})] \geq (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{m i n {ω_{i}, P - 1}}) f (A^{*}) .

E [f (X_{i})] \geq (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{m i n {ω_{i}, P - 1}}) f (A^{*}) .

\frac{1}{P} x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} .

\frac{1}{P} x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} .

E [f (X_{t}) ∣\neg E]

E [f (X_{t}) ∣\neg E]

= E [f (X_{t - 1})]

\geq (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{m i n {ω_{t}, P - 1}}) f (A^{*}),

E [f (X_{t}) ∣ E]

E [f (X_{t}) ∣ E]

= E [f (X_{t - 1}) ∣ E] + E [f (X_{t - 1} \cup {a^{*}}) - f (X_{t - 1}) ∣ E]

\geq (b) E [f (X_{t - 1}) ∣ E] + \frac{1}{∣ A ^{*} ∣} (f (A^{*}) - E [f (X_{t - 1})] ∣ E)

= (c) E [f (X_{t - 1})] + \frac{1}{∣ A ^{*} ∣} (f (A^{*}) - E [f (X_{t - 1})])

\geq (d) (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{ω_{t - 1} + 1}) f (A^{*})

= (e) (1 - (1 - \frac{1}{∣ A ^{*} ∣})^{ω_{t}}) f (A^{*})

\frac{1}{P} x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} \geq \frac{∣ A ^{*} ∣}{e n P} .

\frac{1}{P} x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} \geq \frac{∣ A ^{*} ∣}{e n P} .

P (i = 1 \sum T Y_{i} < κ) \leq \frac{1}{n} .

P (i = 1 \sum T Y_{i} < κ) \leq \frac{1}{n} .

P (i = 1 \sum T Y_{i} < κ)

P (i = 1 \sum T Y_{i} < κ)

\leq (b) e^{- T ρ /8}

\leq (c) ϵ

E [Δ f (B, x) ∣ E] \geq \frac{1}{∣ X ∣} (f (X) - f (B)) .

E [Δ f (B, x) ∣ E] \geq \frac{1}{∣ X ∣} (f (X) - f (B)) .

E [Δ f (B, x) ∣ E]

E [Δ f (B, x) ∣ E]

\geq a \frac{1}{∣ X ∣} (f (X) - f (B))

P (Y \leq (1 - η) μ) \leq e^{- η^{2} μ /2} .

P (Y \leq (1 - η) μ) \leq e^{- η^{2} μ /2} .

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1 - ϵ}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

E [X \in S, ∣ X ∣ \leq ω max f (X)] \geq (1 - (1 - \frac{1 - ϵ}{∣ A ^{*} ∣})^{ω^{'}}) f (A^{*})

f (X_{j})

f (X_{j})

\geq max {f (X) : X \in S, ∣ X ∣ \leq ω_{i}}

\geq a f (X_{i})

E [f (X_{i})] \geq (1 - (1 - \frac{1 - ϵ}{∣ A ^{*} ∣})^{m i n {ω_{i}, P - 1}}) f (A^{*})

E [f (X_{i})] \geq (1 - (1 - \frac{1 - ϵ}{∣ A ^{*} ∣})^{m i n {ω_{i}, P - 1}}) f (A^{*})

ρ = x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} = \frac{∣ A ^{*} ∣}{n} (1 - \frac{1}{n})^{n - 1} \geq \frac{∣ A ^{*} ∣}{e n} .

ρ = x \in A^{*} \sum (1 - \frac{1}{n})^{n - 1} \frac{1}{n} = \frac{∣ A ^{*} ∣}{n} (1 - \frac{1}{n})^{n - 1} \geq \frac{∣ A ^{*} ∣}{e n} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Faster Guarantees of Evolutionary Algorithms for Maximization of Monotone Submodular Functions

Victoria G. Crawford University of Florida [email protected]

Abstract

In this paper, the monotone submodular maximization problem (SM) is studied. SM is to find a subset of size $\kappa$ from a universe of size $n$ that maximizes a monotone submodular objective function $f$ . We show using a novel analysis that the Pareto optimization algorithm achieves a worst-case ratio of $(1-\epsilon)(1-1/e)$ in expectation for every cardinality constraint $\kappa<P$ , where $P\leq n+1$ is an input, in $\mathcal{O}(nP\ln(1/\epsilon))$ queries of $f$ . In addition, a novel evolutionary algorithm called the biased Pareto optimization algorithm, is proposed that achieves a worst-case ratio of $(1-\epsilon)(1-1/e-\epsilon)$ in expectation for every cardinality constraint $\kappa<P$ in $\mathcal{O}(n\ln(P)\ln(1/\epsilon))$ queries of $f$ . Further, the biased Pareto optimization algorithm can be modified in order to achieve a worst-case ratio of $(1-\epsilon)(1-1/e-\epsilon)$ in expectation for cardinality constraint $\kappa$ in $\mathcal{O}(n\ln(1/\epsilon))$ queries of $f$ . An empirical evaluation corroborates our theoretical analysis of the algorithms, as the algorithms exceed the stochastic greedy solution value at roughly when one would expect based upon our analysis.

1 Introduction

A function $f:2^{U}\to\mathbb{R}_{\geq 0}$ defined on subsets of a ground set $U$ of size $n$ is monotone submodular if it possesses the following two properties: (i) For all $A\subseteq B\subseteq U$ , $f(A)\leq f(B)$ (monotonicity); (ii) For all $A\subseteq B\subseteq U$ and $x\notin B$ , $f(A\cup\{x\})-f(A)\geq f(B\cup\{x\})-f(B)$ (submodularity). Monotone submodular set functions are found in many applications in machine learning and data mining. Applications of SM include influence in social networks Kempe et al. (2003), data summarization Mirzasoleiman et al. (2013), dictionary selection Das and Kempe (2011), and monitor placement Soma and Yoshida (2016). As a result, there has been much recent interest in optimization problems involving monotone submodular functions. One such optimization problem is the NP-hard Submodular Maximization Problem (SM), defined as follows.

Problem 1 (Submodular Maximization Problem (SM)).

Let $f:2^{U}\to\mathbb{R}_{\geq 0}$ be a monotone submodular function defined on subsets of the ground set $U$ of size $n$ , and $f(\emptyset)=0$ . Given a budget $\kappa\in[0,n]$ , SM is to find $\text{argmax}_{|X|\leq\kappa}f(X).$

An instance of SM is referred to as SM $(f,\kappa)$ . It is assumed that the function $f$ is provided as a value oracle, which when queried with a set $X$ returns the value of $f(X)$ . Time is measured in queries of $f$ , as is the convention in submodular optimization Badanidiyuru et al. (2014).

To approximate SM, the standard greedy algorithm is very effective. Nemhauser and Wolsey Nemhauser and Wolsey (1978) showed that the standard greedy algorithm achieves the best ratio of $(1-1/e)$ for SM in $\mathcal{O}(nk)$ queries to $f$ . In addition, faster versions of the greedy algorithm have been developed for SM Badanidiyuru et al. (2014); Mirzasoleiman et al. (2015). In particular, the stochastic greedy algorithm (SG) of Mirzasoleiman et al. Mirzasoleiman et al. (2015) achieves ratio $1-1/e-\epsilon$ in expectation in $O(n\ln(1/\epsilon))$ queries to $f$ .

Alternatively, one may take the Pareto optimization approach to SM: Instead of maximizing $f$ for a cardinality constraint $\kappa$ , SM is re-formulated as a bi-objective optimization problem where the goal is to both maximize $f$ as well as minimize cardinality. Instead of a single solution, we seek a pool of solutions none of which dominate another111In this context, a solution $Y$ dominates $X$ if $f(X)\leq f(Y)$ , $|X|\geq|Y|$ , and at least one of the two inequalities is strict.. Greedy algorithms can be used to develop such a pool, however previous works Friedrich and Neumann (2014); Qian et al. (2015b) have employed bi-objective evolutionary algorithms because they iteratively improve the entire pool of solutions and can be run indefinitely. The evolutionary algorithm Pareto Optimization (PO) has previously been shown to find a $1-1/e$ approximate solution to SM $(f,\kappa)$ for all $\kappa<P$ , where $P\leq n+1$ is an input, in expected $\mathcal{O}(nP^{2})$ queries to $f$ Friedrich and Neumann (2014). Further, PO has been demonstrated to make significant empirical improvements over the standard greedy algorithms for SM Qian et al. (2015b). But as the size of data has grown exponentially in recent times, a query complexity that is cubic in $n$ (for $P=\Omega(n)$ ) makes these evolutionary algorithms a less attractive option.

1.1 Contributions

In this work, a novel analysis is provided for the algorithm PO, and it is proven that PO achieves a worst-case ratio of $(1-\epsilon)(1-1/e)$ in expectation for every instance SM $(f,\kappa)$ with $\kappa<P$ , where $P\leq n+1$ is an input, in $\mathcal{O}(nP\ln(1/\epsilon))$ queries of $f$ . This removes a factor of $P$ from the query complexity of Friedrich and Neumann Friedrich and Neumann (2014). This novel analysis has potential to improve the query complexity of other problems in monotone submodular optimization beyond SM Qian et al. (2015a, 2017); Crawford (2019); Bian et al. (2020). This result is proven in Theorem 1.

Next, a novel algorithm Biased Pareto Optimization (BPO) is proposed that is a similar in spirit but faster version of PO for SM. It is proven that BPO achieves a worst-case ratio of $(1-\epsilon)(1-1/e-\epsilon)$ in expectation for every instance SM $(f,\kappa)$ with $\kappa<P$ in $\mathcal{O}(n\ln(P)\ln(1/\epsilon))$ queries of $f$ . This result is proven in Theorem 2. Further, a version of BPO for a specific cardinality constraint $\kappa$ , $\kappa$ -Biased Pareto Optimization ( $\kappa$ -BPO), is proven to achieve a worst-case ratio of $(1-\epsilon)(1-1/e-\epsilon)$ in expectation for instance SM $(f,\kappa)$ in $\mathcal{O}(n\ln(1/\epsilon))$ queries of $f$ . This result is proven in Theorem 3. This new algorithm $\kappa$ -BPO thus matches the optimal SG algorithm in terms of both approximation ratio and query complexity, while maintaining the ability of PO to continuously improve a pool of solutions.

The above theoretical results all extend to the more general setting of monotone $\gamma$ -weakly submodular222 A function $f:2^{U}\to\mathbb{R}_{\geq 0}$ is $\gamma$ -weakly submodular if for all $X\subseteq Y\subseteq U$ , and $u\notin Y$ , $\sum_{u\in Y\setminus X}\Delta f(X,u)\geq\gamma\left(f(X\cup Y)-f(X)\right)$ . If $\gamma=1$ , then $f$ is submodular. functions Das and Kempe (2011), but with different approximation guarantees that depend on $\gamma$ .

An empirical evaluation corroborates our theoretical analysis of the algorithms, as the algorithms exceed the SG solution value at roughly when one would expect based upon our analysis.

1.2 Additional Related Work

Evolutionary algorithms have been studied for many combinatorial optimization problems Laumanns et al. (2002); Neumann and Wegener (2007); Friedrich et al. (2010). In particular, evolutionary algorithms have been analyzed for problems in submodular optimization including SM Friedrich and Neumann (2014); Qian et al. (2015b); Roostapour et al. (2019), submodular cover Qian et al. (2015a); Crawford (2019), SM with more general cost constraints Bian et al. (2020), and noisy versions of SM Qian et al. (2017).

Friedrich and Neumann Friedrich and Neumann (2014) studied a slight variant of PO where the pool $\mathcal{S}$ is initialized to contain a random set, and $P=n$ . Friedrich and Neumann proved that their variant of PO finds a $1-1/e$ approximate solution to SM $(f,\kappa)$ in expected $\mathcal{O}(n^{2}\ln(n)+n^{2}\kappa)$ queries of $f$ . It is easy to modify their analysis to see that PO finds a $1-1/e$ approximate solution to SM $(f,\kappa)$ for all $\kappa<P$ in expected $\mathcal{O}(nP^{2})$ queries of $f$ . The argument of PO used in the proof of Theorem 1 of Section 2.1 is substantially different compared to the argument of Friedrich and Neumann because it analyzes the expected time until an expected approximation ratio is analyzed, resulting in a speedup to $\mathcal{O}(nP)$ queries of $f$ . In addition, the result of Theorem 1 is in deterministic time due to an application of the Chernoff bound.

Qian et al. Qian et al. (2015b) considered the subset selection problem, which is a special case of the monotone $\gamma$ -weakly submodular maximization problem. Qian et al. fixed $P=2\kappa$ , and showed that for the cardinality constraint $\kappa$ PO finds a $1-e^{-\gamma}$ approximate solution in expected $\mathcal{O}(n\kappa^{2})$ queries of $f$ . Their results can be generalized beyond subset selection to the monotone $\gamma$ -weakly submodular maximization problem with cardinality constraint $\kappa$ .

The algorithm BPO, presented in Section 2.2, uses a novel, biased selection procedure to identify sets for mutation. Because of the biased selection procedure, BPO is the first evolutionary algorithm that has an approximation guarantee in nearly linear queries of $f$ close to that of the greedy algorithm for SM.

2 Algorithms and Theoretical Results

The theoretical contributions of the paper are presented in this section. In particular, a new theoretical analysis of the algorithm Pareto Optimization (PO) is presented for SM in Section 2.1, the novel algorithm Biased Pareto Optimization (BPO) is presented and analyzed for SM in Section 2.2, and the faster modification of BPO for a specific cardinality constraint, $\kappa$ -Biased Pareto Optimization ( $\kappa$ -BPO), is presented and analyzed for SM in Section 2.3. The full version of the paper includes an appendix where additional theoretical details from Section 2 are filled in.

Definitions and Notation

The following notation and definitions will be used throughout Section 2. Let $f:2^{U}\to\mathbb{R}_{\geq 0}$ , $X\subseteq U$ , and $x\in U$ . (i) Marginal gain: $\Delta f(X,x)=f(X\cup\{x\})-f(X)$ . (ii) The membership of $x$ is flipped in $X$ means that if $x\in X$ , then $x$ is removed from $X$ ; and if $x\notin X$ , then $x$ is added to $X$ . (iii) If $\nu$ is a random variable, then $\mathbb{E}\left[\nu\right]$ denotes the expected value of $\nu$ . If $A$ is a random event, then $P\left(A\right)$ denotes the probability of $A$ occurring. (iv) Let $\mathcal{S}\subseteq 2^{U}$ . Then if there exists a unique $Y\in\mathcal{S}$ such that $|Y|=i$ , define $\mathcal{S}[i]=Y$ . If no such $Y$ exists, or there are multiple elements of $\mathcal{S}$ of cardinality $i$ , $\mathcal{S}[i]$ is undefined. (v) Let $X,Y\subseteq U$ . Then $X\preceq Y$ if $f(Y)\geq f(X)$ and $|Y|\leq|X|$ . If at least one of the two inequalities is strict, then $X\prec Y$ and $Y$ dominates $X$ . If $f(Y)=f(X)$ and $|Y|=|X|$ then it is said that $X$ is equivalent to $Y$ .

2.1 Pareto Optimization (PO)

In this section, it is proven that in time $\mathcal{O}(nP\ln(1/\epsilon))$ , PO produces a $(1-\epsilon)(1-1/e)$ approximate solution in expectation for every cardinality constraint $\kappa<P$ , where $P\leq n+1$ is an input. If $P=\mathcal{O}(\kappa)$ for a fixed cardinality constraint $\kappa$ , then PO produces solutions for SM with similar theoretical guarantees to that of the standard greedy algorithm in the same asymptotic time, which shows the practicality of evolutionary algorithms such as PO for SM.

2.1.1 Description of PO

In this section, PO (Alg. 1) is described. The set $\mathcal{S}\subseteq 2^{U}$ is referred to as the pool, and each iteration of the for loop is referred to as an iteration. The pool initially contains only the empty set; its maximum size is determined by input parameter $P$ . During each iteration, $i\in\{0,...,P-1\}$ is chosen uniformly randomly (Line 5 of Alg. 1), and if $B=\mathcal{S}[i]$ exists then it is selected from $\mathcal{S}$ to be mutated, otherwise PO continues to the next iteration. The subroutine Mutate takes $B$ and randomly mutates it into $B^{\prime}\subseteq U$ as follows: for each $u\in U$ , flip the membership of $u$ in $B$ with probability $1/n$ . Finally, if no set in the pool dominates or is equivalent to $B^{\prime}$ and $|B^{\prime}|<P$ , then $B^{\prime}$ is added to the pool and all sets that $B^{\prime}$ dominates are removed. There is at most one new query of $f$ on each iteration of PO, and therefore the input $T$ is equal to the query complexity. Pseudocode for the subroutine Mutate is provided in the appendix.

2.1.2 Analysis of PO for SM

In this section, the approximation result of the algorithm PO for SM is presented. Omitted proofs are given in the appendix. The statement of Theorem 1 easily generalizes to $\gamma$ -weakly submodular objectives $f$ , where $1-1/e$ is replaced with $1-1/e^{\gamma}$ .

Theorem 1.

Suppose PO is run with input $(f,P,T)$ , where $f:2^{U}\to\mathbb{R}_{\geq 0}$ is monotone submodular, $P\in\{1,...,n+1\}$ , and $T>8enP\ln(1/\epsilon)$ . Let $\mathcal{S}$ be the pool of PO at the end of iteration $T$ . Then for any $\kappa<P$ ,

[TABLE]

Overview of Proof of Theorem 1

Given SM $(f,\kappa)$ with optimal solution $A^{*}$ , the standard greedy algorithm iteratively picks into its solution the element in $U$ of highest marginal gain until $\kappa$ elements have been picked. Existing analyses of PO for SM Friedrich and Neumann (2014); Qian et al. (2015b) analyze the time it takes until essentially the standard greedy algorithm randomly occurs within PO. If instead of iteratively picking the element of highest marginal gain, a uniformly random element of $A^{*}$ (possibly already chosen) is picked into the solution until $\kappa$ elements are picked (this could be viewed as an idealized version of the stochastic greedy algorithm of Mirzasoleiman et al. Mirzasoleiman et al. (2015)) then the same approximation guarantee as the standard greedy algorithm ( $1-1/e$ ) is achieved in expectation. In the proof of Theorem 1, the expected time until this second algorithm randomly occurs within PO is analyzed, which is a factor of $P$ faster.

Proof.

Line numbers referenced are those in Algorithm 1. Throughout the proof of Theorem 1, the probability space of all possible runs of PO with the stated inputs is considered. Let $\kappa<P$ , and let $A^{*}=\operatorname*{arg\,max}_{|X|\leq\kappa}f(X)$ . We may assume that $|A^{*}|=\kappa$ , since $f$ is monotone. In the proof of Theorem 1, the random variable $\omega$ will be used, defined inductively as follows:

(i)

Before the first iteration of PO, $\omega$ is set to [math].

(ii)

If the following two conditions are met, $\omega$ is incremented at the end of an iteration:

$B=\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\omega\}$ is selected on Line 5; and 2) Mutate on Line 8 results in the membership of a single element $a^{*}\in A^{*}$ being flipped (i.e., it is either the case that Mutate returns $B^{\prime}=B\cup\{a^{*}\}$ for $a^{*}\in A^{*}\setminus B$ or $B^{\prime}=B\setminus\{a^{*}\}$ for $a^{*}\in A^{*}\cap B$ ).

Intuitively, $\omega$ is used to track a solution within $\mathcal{S}$ that has a high $f$ value relative to its cardinality. In particular, the following lemma describes the key property of $\omega$ .

Lemma 1.

At the end of every iteration of PO

[TABLE]

where $\omega^{\prime}=\min\{\omega,P-1\}$ .

A further key point is that once a solution appears in PO, i.e., it is returned by Mutate, there always exists at least as good of a solution within $\mathcal{S}$ .

Lemma 2.

Let $Y\subseteq U$ and $|Y|\leq a<P$ . If $Y$ is returned by Mutate during iteration $i$ of PO, then at the end of any iteration $j\geq i$ it holds that $\max\{f(X):X\in\mathcal{S},|X|\leq a\}\geq f(Y).$

Let event $F$ be that at the completion of a run of PO, $\omega\geq\kappa$ . Then it follows from Lemmas 1 and 2 that

[TABLE]

Then the remainder of the proof of Theorem 1 is to deal with the probability that $\omega$ reaches $\kappa$ . To this end, the following lemma states that the run of PO may be interpreted as a Bernoulli process.

Lemma 3.

Consider a run of PO as a series of Bernoulli trials $Y_{1},...,Y_{T}$ , where each iteration is a trial and a success is defined to be when $\omega$ is incremented. Then $Y_{1},...,Y_{T}$ are independent, identically distributed Bernoulli trials where the probability of success is

[TABLE]

Then Lemma 3 and the Chernoff bound can be used to prove that the probability of $\omega$ not reaching $\kappa$ after $T\geq 8enP\ln(1/\epsilon)$ iterations of PO is small. This is stated in the following lemma.

Lemma 4.

$P\left(\sum_{i=1}^{T}Y_{i}<\kappa\right)\leq\epsilon.$ **

Finally, Theorem 1 follows from the law of total expectation, Inequality 1 and Lemma 4. ∎

2.2 Biased Pareto Optimization (BPO)

Biased Pareto Optimization (BPO) is a novel evolutionary algorithm with nearly the same approximation results as PO for SM in faster time. Specifically, it is proven that in time $\mathcal{O}(n\ln(P)\ln(1/\epsilon))$ , BPO finds a $(1-\epsilon)(1-1/e-\epsilon)$ -approximate solution in expectation for every cardinality constraint $\kappa<P$ , where $P\leq n+1$ is an input. Thus, BPO is faster than PO by a factor of $\Omega(P/\ln(P))$ ; further, it works similarly to PO but has a biased selection procedure instead of choosing uniformly randomly.

2.2.1 Description of BPO

In this section, BPO (Alg. 2) is presented. Pseudocode for BPO can be found in Alg. 2. In overview, BPO follows a similar iterative procedure to PO: every iteration of the for loop, a set in the $\mathcal{S}$ is chosen for mutation; and only sets that are not dominated by any others are kept. The difference from PO is in the selection of the set for mutation; a certain subset of sets in $\mathcal{S}$ are selected more frequently than others, as determined by the parameters $p\in(0,1]$ , $\epsilon\in(0,1)$ , and $\xi\in(0,1)$ , and the variables $\beta_{j}$ for $j\in\{1,...,M\}$ , where $M=\lceil\ln(P)/\ln\left(1/\xi\right)\rceil$ . There is at most one new query of $f$ on each iteration of BPO, and therefore the input $T$ is equal to the query complexity. Next, the selection process is described in detail.

Selection process

During each iteration, with probability $p$ BPO chooses $j$ from $\{1,...,M\}$ uniformly randomly (Line 9) and then sets $i=|\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\beta_{j}\}|$ (Line 10). Otherwise $i$ is chosen uniformly randomly from $\{0,...,P-1\}$ (Line 7). If $B=\mathcal{S}[i]$ exists then it is selected from $\mathcal{S}$ to be mutated, otherwise BPO continues to the next iteration. Initially, $\beta_{j}=0$ $\forall j\in\{1,...,M\}$ . $\beta_{j}$ is incremented to $\beta_{j}+1$ if on $H_{j}=$$e\ln(1/\epsilon)/\xi^{j}$ iterations since the last increment of $\beta_{j}$ $j$ was chosen on Line 9. The variable $\ell_{j}$ is used to determine when $\beta_{j}$ should be incremented: $\beta_{j}$ is incremented during an iteration if and only if $\ell_{j}$ is set to 0 on Line 13. Notice that if $p=0$ , BPO is equivalent to PO.

2.2.2 Analysis of BPO for SM

The approximation results of BPO for SM are now presented. Lemmas referenced in the proof of Theorem 2 can be found in the appendix. The statement of Theorem 2 easily generalizes to $\gamma$ -weakly submodular objectives $f$ , where $1-1/e-\epsilon$ is replaced with $1-1/e^{\gamma}-\epsilon$ .

Theorem 2.

Suppose BPO is run with input $(f,P,T,p,\epsilon,\xi)$ where $f:2^{U}\to\mathbb{R}_{\geq 0}$ is monotone submodular, $P\in\{1,...,n+1\}$ , $T\geq\max\{\alpha n\lceil\ln(P)/\ln\left(1/\xi\right)\rceil,\beta\ln(n)\lceil\ln(P)/\ln\left(1/\xi\right)\rceil\}$ , where $\alpha=2e\ln(1/\epsilon)/p$ and $\beta=8/p$ , $p\in(0,1]$ , $\epsilon\in(0,1)$ , and $\xi\in(0,1)$ . Let $\mathcal{S}$ be the pool of BPO at the end of iteration $T$ . Then for any $\kappa<P$ ,

[TABLE]

Overview of Proof of Theorem 2

Consider SM $(f,\kappa)$ with optimal solution $A^{*}$ . Recall that in the proof of Theorem 1 in Section 2.1, the approximation ratio for SM $(f,\kappa)$ was proven by analyzing the expected time until a variable $\omega$ reaches $\kappa$ . In order for $\omega$ to be incremented during an iteration, $|\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\omega\}|$ must be selected on Line 5, which occurs with probability $1/P$ . If we instead consider an alternative version of PO where the selection is biased towards choosing $|\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\omega\}|$ with constant probability $\alpha>1/P$ , then $\omega$ reaches $\kappa$ faster. The difficulty is that the value of $\omega$ is unknown, since it depends on Mutate flipping the membership of an $a^{*}\in A^{*}$ and nothing else. The idea behind BPO is that we can approximately track $\omega$ , and therefore bias the selection. In particular, for each SM $(f,\kappa)$ with $\kappa<P$ , there exists a $\beta_{i}$ that is approximately equal to the corresponding $\omega$ for $\kappa$ .

Proof.

Proofs of lemmas used can be found in the appendix. Lines numbers referenced are those in Algorithm 2. Throughout the proof of Theorem 2, the probability space of all possible runs of BPO with the stated inputs is considered. An iteration of the for loop in BPO is simply referred to as an iteration.

Consider any $\kappa<P$ . Define $A^{*}=\operatorname*{arg\,max}_{|X|\leq\kappa}f(X)$ . Without loss of generality we may assume that $|A^{*}|=\kappa$ , since $f$ is monotone. There exists $q\in\{1,...,\lceil\ln(P)/\ln\left(1/\xi\right)\rceil\}$ such that

[TABLE]

Then define $\omega=\beta_{q}$ .

The $\omega$ defined here serves a similar purpose to that defined in the proof of Theorem 1; To track a solution within $\mathcal{S}$ that has a high $f$ value relative to its cardinality, as described in the following lemma.

Lemma 5.

At the end of every iteration of BPO

[TABLE]

where $\omega^{\prime}=\min\{\omega,P-1\}$ .

In addition, the property of PO detailed in Lemma 2 of Theorem 1 clearly also holds for BPO.

Define the event $F$ to be that at the completion of a run of BPO $\ell_{q}$ has been incremented (Line 11 of Algorithm 2) $H_{q}\kappa$ times. If $\ell_{q}$ has been incremented $H_{q}\kappa$ times, then one may see that $\omega$ has been incremented $\kappa$ times. Once $\omega$ reaches $\kappa$ , it clearly follows from Lemmas 5 and 2 that

[TABLE]

where $A=\text{argmax}_{X\in\mathcal{S},|X|\leq\kappa}f(X)$ .

We now analyze the probability that $\ell_{q}$ has been incremented $H_{q}\kappa$ times. To this end, we have the following lemma.

Lemma 6.

Consider a run of BPO as a series of Bernoulli trials $Y_{1},...,Y_{T}$ , where each iteration is a trial and a success is defined to be when $\ell_{q}$ is incremented. Then $Y_{1},...,Y_{T}$ are independent, identically distributed Bernoulli trials where the probability of success is $p/\lceil\ln(P)/\ln\left(1/\xi\right)\rceil$ .

Finally, an analogous argument to that of Theorem 1 can be used to complete the proof of Theorem 2. In particular, we bound the probability of event $F$ not occurring after $T\geq\max\{\alpha n\lceil\ln(P)/\ln\left(1/\xi\right)\rceil,\beta\ln(n)\lceil\ln(P)/\ln\left(1/\xi\right)\rceil\}$ , where $\alpha=2e\ln(1/\epsilon)/p$ and $\beta=8/p$ , iterations of BPO by $\epsilon$ using the Chernoff bound, and then apply the law of total expectation. The details of the argument can be found in the appendix. ∎

2.3 $\kappa$ -Biased Pareto Optimization ( $\kappa$ -BPO)

If a specific cardinality constraint $\kappa$ is provided, a modified version of BPO, $\kappa$ -Biased Pareto Optimization ( $\kappa$ -BPO), can produce an approximate solution in expectation even faster than BPO. In this section, the algorithm $\kappa$ -BPO is described, and it is proven that $\kappa$ -BPO finds a $(1-\epsilon)(1-1/e-\epsilon)$ -approximate solution to SM $(f,\kappa)$ in $\mathcal{O}(n\ln(1/\epsilon))$ queries of $f$ .

2.3.1 Description of $\kappa$ -BPO

Pseudocode for $\kappa$ -BPO can be found in the appendix. $\kappa$ -BPO is similar to BPO except $\kappa$ -BPO is only biased towards picking a single element of $\mathcal{S}$ , determined by the variable $\beta$ . The input parameters of $\kappa$ -BPO are the same as BPO except $\xi\in(0,1)$ is not needed.

During each iteration, with probability $p$ $\kappa$ -BPO sets $i=|\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\beta\}|$ . Otherwise $i$ is chosen uniformly randomly from $\{0,...,P-1\}$ . If $B=\mathcal{S}[i]$ exists then it is selected from $\mathcal{S}$ to be mutated, otherwise $\kappa$ -BPO continues to the next iteration. Initially, $\beta=0$ . $\beta$ is incremented to $\beta+1$ if on $H=en\ln(1/\epsilon)/\kappa$ iterations since the last increment of $\beta$ , $i$ was chosen to be $|\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\beta\}|$ . The variable $\ell$ is used to determine when $\beta$ should be incremented: $\beta$ is incremented during an iteration if and only if $\ell$ is set to 0.

2.3.2 Analysis of $\kappa$ -BPO for SM

The approximation results of $\kappa$ -BPO for SM are now presented. The statement of Theorem 3 easily generalizes to $\gamma$ -weakly submodular objectives $f$ in the analogous manner as BPO.

Theorem 3.

Suppose $\kappa$ -BPO is run with input $(f,\kappa,P,T,p,\epsilon)$ where $f:2^{U}\to\mathbb{R}_{\geq 0}$ is monotone submodular, $\kappa\in\{1,...,n\}$ $P\in\{\kappa+1,...,n+1\}$ , $T\geq\max\{2en\ln(1/\epsilon)/p,8\ln(n)/p\}$ , $p\in(0,1]$ , and $\epsilon\in(0,1)$ . Let $\mathcal{S}$ be the pool of $\kappa$ -BPO at the end of iteration $T$ . Then,

[TABLE]

Proof.

The proof of Theorem 3 is any easy modification of the proof of Theorem 2 and therefore details are left to the reader. The key point is that Lemma 6 should be replaced with the following lemma.

Lemma 7.

Consider a run of $\kappa$ -BPO as a series of Bernoulli trials $Y_{1},...,Y_{T}$ , where each iteration is a trial and a success is defined to be when $\ell$ is incremented. Then $Y_{1},...,Y_{T}$ are independent, identically distributed Bernoulli trials where the probability of success is $p$ .

∎

3 Experimental Evaluation

In this section, the algorithms PO and $\kappa$ -BPO are evaluated on instances of data summarization with submodular and non-submodular objectives $f$ . In summary, the faster runtime for PO proven in Theorem 1 is demonstrated empirically. Also, the results demonstrate that $\kappa$ -BPO quickly finds solutions better than the standard greedy algorithm, the stochastic greedy algorithm SG Mirzasoleiman et al. (2015), and PO.

The algorithms evaluated in Section 3 are:

•

the standard greedy algorithm Nemhauser and Wolsey (1978)

•

the stochastic greedy (SG) algorithm of Mirzasoleiman et al. (2015).

•

PO: the variant of the algorithm of Friedrich and Neumann Friedrich and Neumann (2014) as detailed in Alg. 1 and analyzed in Section 2.1.

•

$\kappa$ -BPO: the version of BPO that biases towards only one set in $\mathcal{S}$ , based on the input $\kappa$ as discussed in Section 2.3.

For both PO and $\kappa$ -BPO, the parameter $P=2\kappa$ is used on all instances.

3.1 Application

In data summarization (DS), we have a set $U$ of data points and we wish to find a subset of $U$ of cardinality $\kappa$ that best summarizes the entire dataset $U$ . $f:2^{U}\to\mathbb{R}_{\geq 0}$ takes $X\subseteq U$ to a measure of how effectively $X$ summarizes $U$ . For the ground set $U$ , we use: (i) A set of 10 dimensional vectors drawn from $\kappa$ gaussian distributions (Gaussian), and (ii) a set of $32\times 32$ color images from the CIFAR-100 dataset Krizhevsky et al. (2009) each represented by a 3072 dimensional vector of pixels (CIFAR). For the objective $f$ , we use: (i) The monotonic and submodular objective $k$ -medoid objective Kaufman and Rousseeuw (2009) ( $f_{MED}$ ), and (ii) the monotone weakly submodular objective based on Determinantal Point Process (DPP) Kulesza et al. (2012) ( $f_{DPP}$ ). A lower bound on the submodularity ratio has been proven for the latter objective Bian et al. (2017).

3.2 Results

The experimental results are shown in Figure 1. All results are the mean of $50$ repetitions of each algorithm; shaded regions represent one standard deviation from the mean. Objective and runtime are normalized by the objective value of and number of queries made ( $n\kappa$ ) by the standard greedy algorithm. The value of the solution of the standard greedy algorithm is plotted as a dotted gray horizontal line $y=1$ . The time where $\beta$ in $\kappa$ -BPO reached $\kappa$ is plotted as a vertical magenta line.

The best solution value obtained by each algorithm is shown as the rightmost point in each plot. Both $\kappa$ -BPO and PO were eventually able to find better solutions than the standard greedy algorithm (i.e., normalized value $>1.0$ ), especially on the non-submodular objective (Figures 1(e) and 1(f)). Observe that PO typically exceeds the stochastic greedy objective value within $c\kappa n$ , where $c\leq 2$ . This behavior corroborates our theoretical analysis that PO achieves a good solution in expectation in $O(\kappa n)$ queries. In addition, $\kappa$ -BPO exceeds the SG value in $cn$ queries. Finally, for PO, recall that the theoretical anlaysis shows that for any $\kappa<P$ , the approximation ratio holds.

Because PO and $\kappa$ -BPO can be terminated at any time, the running time may be compared by observing where any vertical line intersects the curves for each algorithm. The running time of the standard greedy corresponds to the line $x=1$ (not plotted). $\kappa$ -BPO reaches solution values closer to the standard greedy algorithm in significantly faster time than PO, as expected by its design. The effect of varying the parameters $\varepsilon$ and $p$ on the behavior of $\kappa$ -BPO is shown in Figs. 1(c), 1(d), respectively: smaller $\varepsilon$ leads to a higher initial increase but the initial increase is slower, while smaller $p$ slows down the rate of the initial increase.

4 Conclusions

In this work, we have re-analyzed the evolutionary algorithm PO, originally analyzed for submodular maximization by Friedrich and Neumann Friedrich and Neumann (2014), and showed that it achieves nearly the optimal worst-case ratio in expectation on SM for any $\kappa<P$ in $O(nP)$ queries. In contrast, Friedrich and Neumann Friedrich and Neumann (2014) showed that the optimal worst-case ratio is achieved in expected $O(nP^{2})$ queries. This improved rate of convergence is supported by an empirical evaluation.

Further, it has been shown that changing the selection process in PO results in improved query complexity to $O(n\log(P))$ to obtain the same approximation results. A variant of this algorithm $\kappa$ -BPO is shown empirically to have a much faster initial rate of convergence to a good solution than PO, without sacrificing the long-term behavior of the PO algorithm.

5 Appendix: Algorithms and Theoretical Results

In this section, additional results from Section 2 are included. In particular: Pseudocode missing from the description of the algorithm PO is given in Section 5.1.1; Lemmas used in Section 2.1 and their corresponding proofs are given in Section 5.1.2; Lemmas used in Section 2.2 and their corresponding proofs are given in Section 5.2.1; Finally, pseudocode for $\kappa$ -BPO is given in Section 5.3

5.1 Pareto Optimization (PO)

5.1.1 Additional Pseudocode

5.1.2 Lemmas used for the Proof of Theorem 1

Lemma 1.

At the end of every iteration of PO

[TABLE]

where $\omega^{\prime}=\min\{\omega,P-1\}$ .

Proof.

At the end of any iteration $i$ of PO, define (i) $\omega_{i}$ to be the value of $\omega$ , and (ii) $X_{i}=\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\omega\}$ . Define $\omega_{0}$ and $X_{0}$ refer to the values at the start of the first iteration. In order to prove Lemma 1, it must be shown that at the end of any iteration $i\in\{0,1,...,T\}$ ,

[TABLE]

Equation 4 will be proven by induction on iteration $i$ . Equation 4 is clearly true for $i=0$ , since on all runs of BPO $\omega_{0}=0$ and $X_{0}=\emptyset$ . Now suppose that Equation 4 is true for iteration $t-1\in\{0,...,T-1\}$ ; it will be shown that it is then true for iteration $t$ .

Define $E$ to be the event that during iteration $t$ of the for loop of PO, $\omega$ is incremented by 1. The following claim establishes that expected value of $f(X_{t-1})$ does not depend on $E$ .

Claim 1.

$\mathbb{E}\left[f(X_{t-1})|E\right]=\mathbb{E}\left[f(X_{t-1})\right].$ **

Proof.

At the beginning of iteration $t$ the probability that $\omega$ will be incremented is

[TABLE]

This does not depend on the value of $f(X_{t-1})$ , so for all $\alpha\in\mathbb{R}_{\geq 0}$ , $P(E|f(X_{t-1})=\alpha)=P(E)$ . Then Bayes’ theorem gives that $P(f(X_{t-1})=\alpha|E)=P(f(X_{t-1})=\alpha)$ , which implies Claim 1. ∎

Inequality 4 is proven by breaking up into the two cases $E$ and $\neg E$ , and then applying the law of total probability. If $E$ did not occur, then it follows from Lemma 2 that

[TABLE]

where the last inequality follows from the inductive assumption and the fact that $\omega_{t}=\omega_{t-1}$ when conditioning on $\neg E$ .

The proof will proceed by considering arbitrary but fixed runs of PO. Consider runs of PO where $E$ did occur. If $E$ occurs, then during iteration $t$ , $X_{t-1}$ is selected on Line 5 and then Mutate results in the membership of a single $a^{*}\in A^{*}$ being flipped in $X_{t-1}$ , i.e. it is either the case that Mutate returns $X_{t-1}\cup\{a^{*}\}$ for $a^{*}\in A^{*}\setminus X_{t-1}$ or $X_{t-1}\setminus\{a^{*}\}$ for $a^{*}\in A^{*}\cap X_{t-1}$ .

Claim 2.

The following holds: $\mathbb{E}\left[f(X_{t})|E\right]\geq\mathbb{E}\left[f(X_{t-1}\cup\{a^{*}\})|E\right].$

Proof.

If $E$ occurs, then Mutate returns either $X_{t-1}\cup\{a^{*}\}$ for $a^{*}\in A^{*}\setminus X_{t-1}$ or $X_{t-1}\setminus\{a^{*}\}$ for $a^{*}\in A^{*}\cap X_{t-1}$ . Let $E_{1}$ be the former event and let $E_{2}$ be the latter event.

Consider a particular run of PO where $E$ occurs. Then $|X_{t-1}\cup\{a^{*}\}|\leq\omega_{t-1}+1=\omega_{t}$ . If $E_{1}$ occurs on this run, then $X_{t-1}\cup\{a^{*}\}$ was returned by Mutate on iteration $t$ and therefore $f(X_{t})\geq f(X_{t-1}\cup\{a^{*}\})$ by Lemma 2. If $E_{2}$ occurs on this run, then $X_{t-1}\cup\{a^{*}\}=X_{t-1}$ . $X_{t-1}$ was returned by Mutate on some run before $t$ since $X_{t-1}\in\mathcal{S}$ at the end of iteration $t-1$ . Therefore $f(X_{t})\geq f(X_{t-1}\cup\{a^{*}\})$ by Lemma 2. Then for any run of PO where $E$ occurs $f(X_{t})\geq f(X_{t-1}\cup\{a^{*}\})$ and so Claim 2 follows. ∎

Next, it holds that

[TABLE]

where (a) follows from Claim 2; (b) follows from Lemma 8; (c) is by Claim 1; (d) is the inductive assumption; (e) is because by definition of $E$ , $\omega_{t}=\omega_{t-1}+1$ . Finally, the inductive step follows by Equations 5, 6 and the law of total probability. Therefore Lemma 1 is proven. ∎

Lemma 2

*Let $Y\subseteq U$ and $|Y|\leq a<P$ . If $Y$ is returned by Mutate during iteration $i$ of PO, then at the end of any iteration $j\geq i$ it holds that $\max\{f(X):X\in\mathcal{S},|X|\leq a\}\geq f(Y).$

Proof.

First, notice that once $Y$ has been returned by Mutate, there exists $X\in\mathcal{S}$ such that $Y\preceq X$ from that point in PO on. Consider the end of any iteration $j\geq i$ . Let $Y^{\prime}$ be $Y^{\prime}\in\mathcal{S}$ such that $Y\preceq Y^{\prime}$ at the end of iteration $j$ . Then $f(Y)\leq f(Y^{\prime})\leq\max\{f(X):X\in\mathcal{S},|X|\leq a\}$ since $|Y^{\prime}|\leq|Y|\leq a$ . ∎

Lemma 3.

Consider a run of PO as a series of Bernoulli trials $Y_{1},...,Y_{T}$ , where each iteration is a trial and a success is defined to be when $\omega$ is incremented. $Y_{1},...,Y_{T}$ are independent, identically distributed Bernoulli trials where the probability of success is

[TABLE]

Proof.

In order for $\omega$ to be incremented on an iteration $t$ (i.e. the trial results in a success), the set $X_{t-1}$ must be selected on Line 5; this occurs with probability $1/P$ . In addition, if any set is selected, Mutate results in flipping of the membership of a single $a^{*}\in A^{*}$ and nothing else, with probability $\sum_{x\in A^{*}}\left(1-\frac{1}{n}\right)^{n-1}\frac{1}{n}.$ Therefore, the probability of success is $\frac{1}{P}\sum_{x\in A^{*}}\left(1-\frac{1}{n}\right)^{n-1}\frac{1}{n}$ . This probability is independent of the iteration $t$ , and therefore it follows that $Y_{1},...,Y_{T}$ are independent and identically distributed. ∎

Lemma 4.

Consider a run of PO as a series of Bernoulli trials $Y_{1},...,Y_{T}$ , where each iteration is a trial and a success is defined to be when $\omega$ is incremented. Then

[TABLE]

Proof.

By Lemma 3, Chernoff’s bound (Lemma 9) may be applied to $Y_{1},...,Y_{T}$ . Let $\rho$ be the probability of success for each $Y_{i}$ (the value of $\rho$ is given in Lemma 3). Then $\mathbb{E}\left[\sum_{i=1}^{T}Y_{i}\right]=T\rho$ . Therefore

[TABLE]

where (a) is because $T\geq 2enP$ combined with the lower bound on $\rho$ given in Lemma 3; (b) is applying Lemma 9 with $\eta=1/2$ ; and (c) is because $T\geq 8enP\ln(1/\epsilon)$ combined with the lower bound on $\rho$ given in Lemma 3. ∎

Lemma 8.

Let $B,X\subseteq U$ , $X\neq\emptyset$ . Suppose that $B$ is input to Mutate, and consider the probability space of all possible outputs of Mutate. Let $E$ be the event that Mutate returns $B\cup\{x\}$ for some $x\in X\setminus B$ or $B\setminus\{x\}$ for some $x\in X\cap B$ . Then

[TABLE]

Proof.

It is clear from the procedure Mutate that the event $E$ is equivalent to uniformly randomly choosing $x\in X$ and then flipping its membership in $B$ . Then it is the case that

[TABLE]

where (a) follows from the monotonicity and submodularity of $f$ . ∎

Lemma 9 (Chernoff bound).

Suppose $Y_{1},\ldots,Y_{T}$ are independent random variables taking values in $\{0,1\}$ . Let $Y$ denote their sum and let $\mu=\mathbb{E}\left[Y\right]$ denote the sum’s expected value. Then for any $\eta>0$

[TABLE]

5.2 Biased Pareto Optimization (BPO)

5.2.1 Lemmas used for the Proof of Theorem 2

Lemma 5

At the end of every iteration of BPO

[TABLE]

*where $\omega^{\prime}=\min\{\omega,P-1\}$ .

Proof.

For notational simplicity we define $H=H_{q}$ , and $\ell=\ell_{q}$ . $\omega_{i}$ and $X_{i}$ are defined analogously as in the proof of Lemma 1: At the end of any iteration $i$ of BPO, define (i) $\omega_{i}$ to be the value of $\omega$ , and (ii) $X_{i}=\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\omega\}$ . $\omega_{0}$ and $X_{0}$ refer to the values at the start of the first iteration. We see in Claim 3 that the objective values of the sequence $X_{1},X_{2},....$ are non-decreasing.

Claim 3.

For any run of BPO, if $i\leq j$ then $f(X_{i})\leq f(X_{j})$ .

Proof.

$\omega$ only increases throughout BPO, therefore $\omega_{i}\leq\omega_{j}$ . Then if $\mathcal{S}$ is the pool at the end of iteration $j$

[TABLE]

where (a) follows from Lemma 2. ∎

In order to prove Lemma 5 it must be shown that for any iteration $i\in\{0,1,...,T\}$ ,

[TABLE]

which will be proven using induction on $i$ . Equation 7 is clearly true for $i=0$ , since on all runs of BPO $\omega_{0}=0$ and $X_{0}=\emptyset$ . Now suppose that Equation 7 is true for every iteration $i<t\in\{1,...,T\}$ ; It will now be shown that Equation 7 is true for iteration $t$ .

Define the random variable $\sigma(t)=\min\{i\in\{0,...,T-1\}:\omega_{i}=\omega_{t-1}\}$ . In other words $\sigma(t)$ is the iteration during which $\omega$ was incremented to equal $\omega_{t-1}$ , which is the value of $\omega$ at the beginning of iteration $t$ . Then $\omega$ is not incremented on any iteration in $(\sigma(t),t)$ . We will be using the inductive assumption on iteration $\sigma(t)$ in order to prove Equation 7. Define the events $E$ and $F$ as follows:

(i)

Event $E$ is that during iteration $t$ of BPO, $\omega$ is incremented. Observe that if $E$ occurred, then during iteration $t$ of BPO $\ell$ was incremented to equal $H$ . Since $\ell$ is set to 0 during iteration $\sigma(t)$ , this implies that that on $H$ distinct iterations of BPO $i_{1},...,i_{H}\in(\sigma(t),t]$ $\ell$ was incremented. In other words, on each of these iterations $j\in\{i_{1},...,i_{H}\}$ the set $X_{j-1}$ is chosen for mutation. Event $E$ is illustrated in Figure 2.

(ii)

Event $F$ is defined to be that $E$ occurred and further that during one of the iterations $r\in\{i_{1},...,i_{H}\}$ Mutate resulted in the membership of a single element being flipped and it was an element of $A^{*}$ , i.e. it is either the case that Mutate returns $X_{r-1}\cup\{a^{*}\}$ for $a^{*}\in A^{*}\setminus X_{r-1}$ or $X_{r-1}\setminus\{a^{*}\}$ for $a^{*}\in A^{*}\cap X_{r-1}$ . Event $F$ is illustrated in Figure 3.

We now prove a couple of results concerning the events $E$ and $F$ . Claim 4 states that if $E$ occurs then $F$ does as well with high probability. Claim 5 states that the expected value of $f(X_{\sigma(t)})$ is independent of event $F$ .

Claim 4.

$P(F|E)\geq 1-\epsilon$ .

Proof.

At the beginning of any iteration the probability that Mutate will result in the flipping of a single element in $A^{*}$ and no other changes is

[TABLE]

Therefore the probability that $\operatorname*{arg\,max}\{f(X):X\in\mathcal{S},|X|\leq\omega\}$ has been selected on Line 18 $H$ times and on none of those iterations did Mutate result only in the flipping of a single element in $A^{*}$ is

[TABLE]

where (a) follows from Equation 2. Claim 4 then follows. ∎

Claim 5.

$\mathbb{E}\left[f(X_{\sigma(t)})|F\right]=\mathbb{E}\left[f(X_{\sigma(t)})\right].$ **

Proof.

Let $\alpha\in\mathbb{R}_{\geq 0}$ . First it is shown that $P(E|f(X_{\sigma(t)})=\alpha)=P(E)$ . On any iteration of BPO, the probability that $\ell$ will be incremented is $p/\lceil\ln(P)/\ln\left(1/\xi\right)\rceil$ . Whether it increments $H$ times in the interval $(\sigma(t),t]$ does not depend on the value of $f(X_{\sigma(t)})$ , therefore $P(E|f(X_{\sigma(t)})=\alpha)=P(E)$ .

Next, it is shown that $P(F|f(X_{\sigma(t)})=\alpha)=P(F)$ . If it is assumed that $E$ occurred and so $\ell$ is incremented $H$ times in the interval $(\sigma(t),t]$ , then on each of these increments the probability that Mutate results only in the flipping of a single element in $A^{*}$ is

[TABLE]

This does not depend on the value of $f(X_{\sigma(t)})$ , therefore $P(F|E\land f(X_{\sigma(t)})=\alpha)=P(F|E)$ .

Then we use the above two facts to see that $P(F|f(X_{\sigma(t)})=\alpha)=P(F|E\land f(X_{\sigma(t)})=\alpha)P(E|f(X_{\sigma(t)})=\alpha)=P(F|E)P(E)=P(F)$ . Then Bayes’ theorem gives that $P(f(X_{\sigma(t)})=\alpha|F)=P(f(X_{\sigma(t)})=\alpha)$ , which implies the statement of Claim 5. ∎

Equation 7 will be proven by breaking up runs of BPO into those where $E$ occurs and doesn’t occur, then further breaking up runs of BPO where $E$ occurs into those where $F$ occurs and doesn’t occur, and finally applying the law of total probability. The case of runs of BPO where $E$ does not occur is an easy result of Lemma 2:

[TABLE]

We now consider runs of BPO where event $F$ occurs. If $F$ occurred, then on some iteration $r\in\{i_{1},...,i_{H}\}$ Mutate resulted in the membership of an $a^{*}\in A^{*}$ being flipped and no other changes. Now two needed claims are proven. Claim 6 states that if $F$ occurs then the expected value of $f(X_{t})$ is at least the expected value of $f(X_{r-1}\cup\{a^{*}\})$ , while Claim 7 gives a lower bound on the expected value of $f(X_{r-1}\cup\{a^{*}\})$ . Together, these two claims provide a lower bound on the expected value of $f(X_{t})$ if $F$ occurs.

Claim 6.

$\mathbb{E}\left[f(X_{t})|F\right]\geq\mathbb{E}\left[f(X_{r-1}\cup\{a^{*}\})|F\right]$ **

Proof.

If $F$ occurs, then Mutate returns either $X_{r-1}\cup\{a^{*}\}$ for $a^{*}\in A^{*}\setminus X_{r-1}$ or $X_{r-1}\setminus\{a^{*}\}$ for $a^{*}\in A^{*}\cap X_{r-1}$ . Let $F_{1}$ be the former event and let $F_{2}$ be the latter event.

Consider a particular run of BPO where $F$ occurs. Then $|X_{r-1}\cup\{a^{*}\}|\leq\omega_{r-1}+1=\omega_{t-1}+1=\omega_{t}$ since $\omega$ did not increment on runs in $(\sigma(t),t)$ but did on iteration $t$ . If $F_{1}$ occurs on this run, then $X_{r-1}\cup\{a^{*}\}$ was returned by Mutate on iteration $r$ and therefore $f(X_{t})\geq f(X_{r-1}\cup\{a^{*}\})$ by Lemma 2. If $F_{2}$ occurs on this run, then $X_{r-1}\cup\{a^{*}\}=X_{r-1}$ . $X_{r-1}$ was returned by Mutate on some run before $t$ since $X_{r-1}\in\mathcal{S}$ at the end of iteration $r-1$ . Therefore $f(X_{t})\geq f(X_{r-1}\cup\{a^{*}\})$ by Lemma 2. Then for any run of PO where $F$ occurs $f(X_{t})\geq f(X_{r-1}\cup\{a^{*}\})$ and so Claim 2 follows. ∎

Claim 7.

$\mathbb{E}\left[f(X_{r-1}\cup\{a^{*}\})|F\right]\geq\left(1-1/|A^{*}|\right)\mathbb{E}\left[f(X_{r-1})|F\right]+1/|A^{*}|f(A^{*})$ **

Proof.

Consider all runs of BPO where $F$ occurs. Then it can be seen by inspecting Mutate that any element of $A^{*}$ is equally likely to be the one flipped. Therefore

[TABLE]

where (a) follows from the monotonicity and submodularity of $f$ . Claim 7 follows by re-arranging the equation. ∎

Now we can apply the above claims in order to see that

[TABLE]

where (a) is applying Claims 6 and 7; (b) is applying Claim 3; and (c) is applying Claim 5. Finally

[TABLE]

where (a) is because $\mathbb{E}\left[f(X_{t})|\neg F\right]\geq\mathbb{E}\left[f(X_{\sigma(t)}|\neg F)\right]$ by Claim 3 and $\mathbb{E}\left[f(X_{\sigma(t)})|\neg F\right]=\mathbb{E}\left[f(X_{\sigma(t)})\right]$ by Claim 5; (b) is using Equation 9; (c) is using Claim 4; (d) is using the inductive assumption; and (e) is because if $E$ occurred then $\omega_{\sigma(t)}+1=\omega_{t}$ . Finally, the inductive step follows by Equations 8, 10 and the law of total probability. Therefore Lemma 5 is proven. ∎

Lemma 6

*Consider a run of BPO as a series of Bernoulli trials $Y_{1},...,Y_{T}$ , where each iteration is a trial and a success is defined to be when $\ell$ is incremented. $Y_{1},...,Y_{T}$ are independent, identically distributed Bernoulli trials where the probability of success is $p/\lceil\ln(P)/\ln\left(1/\xi\right)\rceil$ .

Proof.

In order for $\ell$ to be incremented, the element in $\mathcal{S}$ of cardinality $\beta$ must be chosen by Select-BPO, which one can see by inspecting Select-BPO that this occurs with probability $p$ . ∎

End of Proof of Theorem 2

By Lemma 6, we are able to apply Chernoff’s bound (Lemma 9) to $Y_{1},...,Y_{T}$ . By seeing that $\mathbb{E}\left[\sum_{i=1}^{T}Y_{i}\right]=T$$p/\lceil\ln(P)/\ln\left(1/\xi\right)\rceil$ ,

[TABLE]

where (a) is because $T\geq 2ne\ln(1/\epsilon)\lceil\ln(P)/\ln\left(1/\xi\right)\rceil/p$ ; (b) is applying Lemma 9 with $\eta=1/2$ ; and (c) is because $T\geq 8\ln(n)\lceil\ln(P)/\ln\left(1/\xi\right)\rceil/p$ .

5.3 $\kappa$ -Biased Pareto Optimization ( $\kappa$ -BPO)

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Badanidiyuru et al. (2014) Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: massive data summarization on the fly. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , 2014.
2Bian et al. (2017) Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications. In International Conference on Machine Learning (ICML) , 2017.
3Bian et al. (2020) Chao Bian, Chao Feng, Chao Qian, and Yang Yu. An efficient evolutionary algorithm for subset selection with general cost constraints. In AAAI Conference on Artificial Intelligence (AAAI) , 2020.
4Crawford (2019) Victoria G. Crawford. An efficient evolutionary algorithm for minimum cost submodular cover. In International Joint Conference on Artificial Intelligence (IJCAI) , 2019.
5Das and Kempe (2011) Abhimanyu Das and David Kempe. Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection. Proceedings of the 28th International Conference on Machine Learning (ICML) , 2011.
6Friedrich and Neumann (2014) Tobias Friedrich and Frank Neumann. Maximizing submodular functions under matroid constraints by evolutionary algorithms. In International Conference on Parallel Problem Solving from Nature (PPSN) , 2014.
7Friedrich et al. (2010) Tobias Friedrich, Jun He, Nils Hebbinghaus, Frank Neumann, and Carsten Witt. Approximating covering problems by randomized search heuristics using multi-objective models. Evolutionary Computation , 18(4):617–633, 2010.
8Kaufman and Rousseeuw (2009) Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction to cluster analysis , volume 344. John Wiley & Sons, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Faster Guarantees of Evolutionary Algorithms for Maximization of Monotone Submodular Functions

Abstract

1 Introduction

Problem 1** (Submodular Maximization Problem (SM)).**

1.1 Contributions

1.2 Additional Related Work

2 Algorithms and Theoretical Results

Definitions and Notation

2.1 Pareto Optimization (PO)

2.1.1 Description of PO

2.1.2 Analysis of PO for SM

Theorem 1**.**

Overview of Proof of Theorem 1

Proof.

Lemma 1**.**

Lemma 2**.**

Lemma 3**.**

Lemma 4**.**

2.2 Biased Pareto Optimization (BPO)

2.2.1 Description of BPO

Selection process

2.2.2 Analysis of BPO for SM

Theorem 2**.**

Overview of Proof of Theorem 2

Proof.

Lemma 5**.**

Lemma 6**.**

2.3 κ\kappaκ-Biased Pareto Optimization (κ\kappaκ-BPO)

2.3.1 Description of κ\kappaκ-BPO

2.3.2 Analysis of κ\kappaκ-BPO for SM

Theorem 3**.**

Proof.

Lemma 7**.**

3 Experimental Evaluation

3.1 Application

3.2 Results

4 Conclusions

5 Appendix: Algorithms and Theoretical Results

5.1 Pareto Optimization (PO)

5.1.1 Additional Pseudocode

5.1.2 Lemmas used for the Proof of Theorem 1

Lemma 1.

Proof.

Claim 1**.**

Proof.

Claim 2**.**

Proof.

Lemma 2

Proof.

Lemma 3.

Proof.

Lemma 4.

Proof.

Lemma 8**.**

Proof.

Lemma 9** (Chernoff bound).**

5.2 Biased Pareto Optimization (BPO)

5.2.1 Lemmas used for the Proof of Theorem 2

Lemma 5

Proof.

Claim 3**.**

Proof.

Claim 4**.**

Proof.

Claim 5**.**

Proof.

Claim 6**.**

Proof.

Claim 7**.**

Proof.

Lemma 6

Proof.

End of Proof of Theorem 2

5.3 κ\kappaκ-Biased Pareto Optimization (κ\kappaκ-BPO)

Problem 1 (Submodular Maximization Problem (SM)).

Theorem 1.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Theorem 2.

Lemma 5.

Lemma 6.

2.3 $\kappa$ -Biased Pareto Optimization ( $\kappa$ -BPO)

2.3.1 Description of $\kappa$ -BPO

2.3.2 Analysis of $\kappa$ -BPO for SM

Theorem 3.

Lemma 7.

Claim 1.

Claim 2.

Lemma 8.

Lemma 9 (Chernoff bound).

Claim 3.

Claim 4.

Claim 5.

Claim 6.

Claim 7.

5.3 $\kappa$ -Biased Pareto Optimization ( $\kappa$ -BPO)