Deterministic & Adaptive Non-Submodular Maximization via the Primal   Curvature

J. David Smith; My T. Thai

arXiv:1702.07002·cs.DS·January 16, 2018

Deterministic & Adaptive Non-Submodular Maximization via the Primal Curvature

J. David Smith, My T. Thai

PDF

Open Access

TL;DR

This paper introduces a new technique for analyzing the performance of greedy algorithms in maximizing non-submodular functions, extending classical guarantees to adaptive and stochastic settings.

Contribution

It presents a novel curvature-based method to bound greedy algorithm performance for non-submodular functions, including adaptive and stochastic cases.

Findings

01

Provides a curvature-based approximation ratio for non-submodular maximization

02

Extends classical ratios to adaptive greedy algorithms

03

Supports applications with incomplete data and uncertainty

Abstract

While greedy algorithms have long been observed to perform well on a wide variety of problems, up to now approximation ratios have only been known for their application to problems having submodular objective functions $f$ . Since many practical problems have non-submodular $f$ , there is a critical need to devise new techniques to bound the performance of greedy algorithms in the case of non-submodularity. Our primary contribution is the introduction of a novel technique for estimating the approximation ratio of the greedy algorithm for maximization of monotone non-decreasing functions based on the curvature of $f$ without relying on the submodularity constraint. We show that this technique reduces to the classical $(1 - 1/ e)$ ratio for submodular functions. Furthermore, we develop an extension of this ratio to the adaptive greedy algorithm, which allows applications to non-submodular…

Equations77

f (T \cup {x}) - f (T) \leq f (S \cup {x}) - f (S)

f (T \cup {x}) - f (T) \leq f (S \cup {x}) - f (S)

c = j \in X max {1 - \frac{f ( X ) - f ( X ∖ { j })}{f ({ j }) - f ( \emptyset )}}

c = j \in X max {1 - \frac{f ( X ) - f ( X ∖ { j })}{f ({ j }) - f ( \emptyset )}}

α = S \subseteq X, i, j \in X max \frac{f _{i} ( S \cup { j })}{f _{i} ( S )}

α = S \subseteq X, i, j \in X max \frac{f _{i} ( S \cup { j })}{f _{i} ( S )}

[1 - (1 - A_{k}^{- 1})^{k}] f (S^{*}) \leq f (S)

[1 - (1 - A_{k}^{- 1})^{k}] f (S^{*}) \leq f (S)

S = I \in I arg max f (I)

S = I \in I arg max f (I)

\nabla_{f} (i, j ∣ S) = \frac{f _{i} ( S \cup { j })}{f _{i} ( S )}

\nabla_{f} (i, j ∣ S) = \frac{f _{i} ( S \cup { j })}{f _{i} ( S )}

Γ (x ∣ T, S) = j = 1 \prod r \nabla_{f} (x, t_{j} ∣ S \cup {t_{1}, t_{2}, \dots, t_{j - 1}})

Γ (x ∣ T, S) = j = 1 \prod r \nabla_{f} (x, t_{j} ∣ S \cup {t_{1}, t_{2}, \dots, t_{j - 1}})

Γ (x ∣ T, S) = \frac{f _{x} ( S \cup T )}{f _{x} ( S )}

Γ (x ∣ T, S) = \frac{f _{x} ( S \cup T )}{f _{x} ( S )}

\frac{f _{x} ( S \cup { t _{1} })}{f _{x} ( S )} \cdot \frac{f _{x} ( S \cup { t _{1} , t _{2} })}{f _{x} ( S \cup { t _{1} })} \dots \frac{f _{x} ( S \cup T )}{f _{x} ( S \cup { t _{1} , t _{2} , \dots t _{r - 1} })}

\frac{f _{x} ( S \cup { t _{1} })}{f _{x} ( S )} \cdot \frac{f _{x} ( S \cup { t _{1} , t _{2} })}{f _{x} ( S \cup { t _{1} })} \dots \frac{f _{x} ( S \cup T )}{f _{x} ( S \cup { t _{1} , t _{2} , \dots t _{r - 1} })}

f (T) - f (S) = j = 1 \sum r Γ (t_{j} ∣ S_{j - 1}, S) f_{t_{j}} (S)

f (T) - f (S) = j = 1 \sum r Γ (t_{j} ∣ S_{j - 1}, S) f_{t_{j}} (S)

f (T) - f (S) = f (S \cup {j_{1}, j_{2}, \dots j_{r}}) - f (S) = t = 1 \sum r f_{j_{t}} (S_{t - 1})

f (T) - f (S) = f (S \cup {j_{1}, j_{2}, \dots j_{r}}) - f (S) = t = 1 \sum r f_{j_{t}} (S_{t - 1})

f (T) - f (S) = t = 1 \sum r Γ (j_{t} ∣ S_{t - 1}, S) f_{j_{t}} (S)

f (T) - f (S) = t = 1 \sum r Γ (j_{t} ∣ S_{t - 1}, S) f_{j_{t}} (S)

[1 + (\frac{f ( S ^{+} )}{f ( S )} - 1) \hat{Γ} (S)]^{- 1} f (S^{*}) \leq f (S)

[1 + (\frac{f ( S ^{+} )}{f ( S )} - 1) \hat{Γ} (S)]^{- 1} f (S^{*}) \leq f (S)

\forall T \in I : j_{t} \in T ∖ S \sum Γ (j_{t} ∣ S_{t - 1}, S) \leq \hat{Γ} (S)

\forall T \in I : j_{t} \in T ∖ S \sum Γ (j_{t} ∣ S_{t - 1}, S) \leq \hat{Γ} (S)

f (S^{*} \cup S) - f (S) = t = 1 \sum r Γ (x ∣ S_{t - 1}, S) f_{j_{t}} (S)

f (S^{*} \cup S) - f (S) = t = 1 \sum r Γ (x ∣ S_{t - 1}, S) f_{j_{t}} (S)

f (S^{*}) - f (S) \leq f (S^{*} \cup S) - f (S) \leq f_{g_{k + 1}} (S) t = 1 \sum r Γ (j_{t} ∣ S_{t - 1}, S)

f (S^{*}) - f (S) \leq f (S^{*} \cup S) - f (S) \leq f_{g_{k + 1}} (S) t = 1 \sum r Γ (j_{t} ∣ S_{t - 1}, S)

f (S^{*}) - f (S) \leq f_{g_{k + 1}} (S) \hat{Γ} (S)

f (S^{*}) - f (S) \leq f_{g_{k + 1}} (S) \hat{Γ} (S)

f (S^{*})

f (S^{*})

= f (S) + (f (S^{+}) - f (S)) \hat{Γ} (S)

[1 + (\frac{f ( S ^{+} )}{f ( S )} - 1) \hat{Γ} (S)]^{- 1} f (S^{*}) \leq f (S)

[1 + (\frac{f ( S ^{+} )}{f ( S )} - 1) \hat{Γ} (S)]^{- 1} f (S^{*}) \leq f (S)

[1 - (1 - \hat{Γ}^{- 1})^{k}] f (S^{*}) \leq f (S)

[1 - (1 - \hat{Γ}^{- 1})^{k}] f (S^{*}) \leq f (S)

f (S^{*}) - f (S_{l}) \leq f_{g_{l + 1}} (S) \hat{Γ} (S_{l})

f (S^{*}) - f (S_{l}) \leq f_{g_{l + 1}} (S) \hat{Γ} (S_{l})

\hat{Γ} 1 - (\frac{Γ ^ - 1}{Γ ^})^{k} f (S^{*}) = \hat{Γ} [1 - (1 - \hat{Γ}^{- 1})^{k}] f (S^{*})

\hat{Γ} 1 - (\frac{Γ ^ - 1}{Γ ^})^{k} f (S^{*}) = \hat{Γ} [1 - (1 - \hat{Γ}^{- 1})^{k}] f (S^{*})

(\hat{Γ} (1 - \hat{Γ}^{- 1})^{k - l} + i = l + 1 \sum k (1 - \hat{Γ}^{- 1})^{k - i}) f_{g_{l + 1}} (S_{l})

(\hat{Γ} (1 - \hat{Γ}^{- 1})^{k - l} + i = l + 1 \sum k (1 - \hat{Γ}^{- 1})^{k - i}) f_{g_{l + 1}} (S_{l})

[1 - (1 - \hat{Γ}^{- 1})^{k}] f (S^{*}) \leq f (S)

[1 - (1 - \hat{Γ}^{- 1})^{k}] f (S^{*}) \leq f (S)

(1 - 1/ e) f (S^{*}) \leq f (S)

(1 - 1/ e) f (S^{*}) \leq f (S)

\nabla_{f} (i, j ∣ ψ) = E [\frac{Δ ( i ∣ ψ \cup s )}{Δ ( i ∣ ψ )} s \in S (j)]

\nabla_{f} (i, j ∣ ψ) = E [\frac{Δ ( i ∣ ψ \cup s )}{Δ ( i ∣ ψ )} s \in S (j)]

Γ (i ∣ ψ^{'}, ψ) = E s_{j} \in Q \prod \nabla^{'} (i, s_{j} ∣ ψ \cup {s_{1}, \dots, s_{j - 1}}) Q \in ψ \to ψ^{'}

Γ (i ∣ ψ^{'}, ψ) = E s_{j} \in Q \prod \nabla^{'} (i, s_{j} ∣ ψ \cup {s_{1}, \dots, s_{j - 1}}) Q \in ψ \to ψ^{'}

Γ (i ∣ ψ^{'}, ψ) = \frac{Δ ( i ∣ ψ ^{'} )}{Δ ( i ∣ ψ )}

Γ (i ∣ ψ^{'}, ψ) = \frac{Δ ( i ∣ ψ ^{'} )}{Δ ( i ∣ ψ )}

\frac{Δ ( i ∣ ψ \cup { s _{1} })}{Δ ( i ∣ ψ )} \cdot \frac{Δ ( i ∣ ψ \cup { s _{1} , s _{2} })}{Δ ( i ∣ ψ \cup { s _{1} })} \dots \frac{Δ ( i ∣ ψ ^{'} )}{Δ ( i ∣ ψ ^{'} ∖ { s _{r - 1} })}

\frac{Δ ( i ∣ ψ \cup { s _{1} })}{Δ ( i ∣ ψ )} \cdot \frac{Δ ( i ∣ ψ \cup { s _{1} , s _{2} })}{Δ ( i ∣ ψ \cup { s _{1} })} \dots \frac{Δ ( i ∣ ψ ^{'} )}{Δ ( i ∣ ψ ^{'} ∖ { s _{r - 1} })}

Δ (i ∣ ψ^{'}) \leq \hat{Γ} (ψ) Δ (g_{l + 1} ∣ ψ)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Risk and Portfolio Optimization · Game Theory and Voting Systems

Full text

Deterministic & Adaptive Non-Submodular Maximization

via the Primal Curvature

J. David Smith

CISE DepartmentUniversity of FloridaGainesvilleFlorida32611

[email protected]

and

My T. Thai

CISE DepartmentUniversity of FloridaGainesvilleFlorida32611

[email protected]

Abstract.

While greedy algorithms have long been observed to perform well on a wide variety of problems, up to now approximation ratios have only been known for their application to problems having submodular objective functions $f$ . Since many practical problems have non-submodular $f$ , there is a critical need to devise new techniques to bound the performance of greedy algorithms in the case of non-submodularity.

Our primary contribution is the introduction of a novel technique for estimating the approximation ratio of the greedy algorithm for maximization of monotone non-decreasing functions based on the curvature of $f$ without relying on the submodularity constraint. We show that this technique reduces to the classical $(1-1/e)$ ratio for submodular functions. Furthermore, we develop an extension of this ratio to the adaptive greedy algorithm, which allows applications to non-submodular stochastic maximization problems. This notably extends support to applications modeling incomplete data with uncertainty.

††copyright: rightsretained

1. Introduction

It is well-known that greedy approximation algorithms perform remarkably well, especially when the traditional ratio of $(1-1/e)\approx 0.63$ (Nemhauser et al., 1978) for maximization of submodular objective functions is considered. Over the four decades since the proof of this ratio, the use of greedy approximations has become widespread due to several factors. First, many interesting problems satisfy the property of submodularity, which states that the marginal gain of an element never increases. If this condition is satisfied, and the set of possible solutions can be phrased as a uniform matroid, then one of the highest general-purpose approximation ratios is available “for free” with the use of the greedy algorithm. Second, the greedy algorithm is exceptionally simple both to understand and to implement.

A concrete example of this is the Influence Maximization problem, to which the greedy algorithm was applied with great success – ultimately leading to an empirical demonstration that it performed near-optimally on real-world data (Li et al., 2017). Kempe et al. showed this problem to be submodular under a broad class of influence diffusion models known as Triggering Models (Kempe et al., ). This led to a number of techniques being developed to improve the efficiency of the sampling needed to construct the problem instance (see e.g. (Borgs et al., ; Tang et al., ; Nguyen et al., ) and references therein) while maintaining a $(1-1/e-\epsilon)$ ratio as a result of the greedy algorithm. This line of work ultimately led to a $(1-\epsilon)$ -approximation by taking advantage the dramatic advances in sampling efficiency to construct an IP that can be solved in reasonable time (Li et al., 2017). In testing this method, it was found that greedy solutions performed near-optimally – an unexpected result given the $1-1/e$ worst-case.

For non-submodular problems, no general approximation ratio for greedy algorithms is known. However, due to their simplicity they frequently see use as simple baselines for comparison. On the Robust Influence Maximization problem proposed by He & Kempe, the simple greedy method was used in this manner (He and Kempe, 2016). This problem consists of a non-submodular combination of Influence Maximization sub-problems and aims to address uncertainty in the diffusion model. Yet despite the non-submodularity of the problem, the greedy algorithm performed no worse than the bi-criteria approximation (He and Kempe, 2016).

Another recent example of this phenomena is the socialbot reconnaissance attack studied by Li et al. (Li et al., 2016). They consider a minimization problem that seeks to answer how long a bot must operate to extract a certain level of sensitive information, and find that the objective function is (adaptive) submodular only in a scenario where users disregard network topology. In this scenario, the corresponding maximization problem, Max-Crawling, has a $1-1/e$ ratio due to the work of Golovin & Krause (Golovin and Krause, 2011). However, this constraint does not align with observed user behaviors. They give a model based on the work of Boshmaf et al. (Boshmaf et al., ), who observed that the number of mutual friends with the bot strongly correlates with friending acceptance rate. Although this model is no longer adaptive submodular, the greedy algorithm still exhibited excellent performance. Thus we see that while submodularity is sufficient to imply good performance, it is is not necessary for the greedy algorithm to perform well.

This, in turn, leads us to ask: is there any tool to theoretically bound the performance of greedy maximization with non-submodularity? Unfortunately, this condition has seen little study. Wang et al. give a ratio for it in terms of the worst-case rate of change in marginal gain (the elemental curvature $\alpha$ ) (Wang et al., 2014). This suffices to construct bounds for non-submodular greedy maximization, though for non-trivial problem sizes they quickly approach 0. We note, however, that the $\alpha$ ratio still encodes strong assumptions about the worst case: that the global maximum rate of change can occur an arbitrary number of times.

Motivated by the unlikeliness of this scenario, our proposed bound instead works with an estimate of how much change can occur during the $k$ steps taken by the greedy algorithm.

The remainder of this paper is arranged as follows: First, we briefly cover the preliminary material needed for the proofs and define the class of problems to which they apply (Sec. 1.1). We next define the notion of curvature used and develop a proof of the ratio based on it, with an extension to adaptive greedy algorithms, and show it is equivalent to the traditional $1-1/e$ ratio for submodular objectives (Sec. 2), and conclude with a reflection on the contributions and a discussion of future work (Sec. 3).

Contributions.

•

A technique for estimating the approximation ratio of greedy maximization of non-submodular monotone non-decreasing objectives on uniform matroids.

•

An extension of this technique to adaptive greedy optimization, where future greedy steps depend on the success or failure of prior steps.

1.1. Background & Related Work

To understand both the state of the art and advancements of this work, we first briefly cover each constraint required by the classical $1-1/e$ ratio (Nemhauser et al., 1978).

1.1.1. Constraints on the $1-1/e$ Ratio

Uniform Matroids. A matroid defines the notion of dependencies between elements of a set, and are denoted by $\mathcal{M}=(X,\mathcal{I})$ . $\mathcal{I}\subseteq 2^{X}$ is the set of independent subsets of the universe $X$ .111For a complete treatment on matroids and associated theory, see Oxley (Oxley, 1992). For our purposes, it will suffice to cover the semantic meaning of $k$ -uniform matroids, which is codified as follows:

(1)

All subsets $S$ of a feasible solution $T$ must also be feasible solutions. 2. (2)

Every $T\subset X,|T|=k$ is a feasible solution and is maximal in the sense that no superset $T\subset R\subset X$ is feasible.

For general matroids, there exists a $1/2$ ratio for greedy maximization of submodular functions due to Fisher et al. (Fisher et al., ). This is a special case of their $1/(p+1)$ ratio for the intersection of $p$ matroids.

Submodularity. The submodularity condition states that given any subsets $S\subset T$ of a universe $X$ , the marginal gain of any $x\in X$ does not increase as the cardinality increases:

[TABLE]

This formally encodes the idea of diminishing returns. Leskovec et al. exploited this property to show a data-dependent bound in terms of the marginal gain of the top- $k$ un-selected elements (Leskovec et al., ), which was generalized to the adaptive case (Golovin and Krause, 2011).

To the best of our knowledge, the only generally applicable relaxation of this constraint is the work of Wang et al. (Wang et al., 2014), who define a ratio in terms the elemental curvature of a function, which encodes the degree with which a function may break submodularity.

1.1.2. Alternate Problems & Algorithms

The $1-1/e$ ratio has shown surprising generality, with proofs that it holds for maximization of sequence functions (Zhang et al., 2016) (and references) and adaptive stochastic maximization of functions that are submodular in expectation (Golovin and Krause, 2011), among others. However, not all adjacent work relies on the same naïve greedy method. To obtain a bound on the relaxation of monotonicity, Buchbinder et al. (Buchbinder et al., ) proposed a “double-greedy” algorithm with a $1/3$ (deterministic) or $1/2$ (randomized) ratio. For maximization on an intersection of $p\geq 2$ matroids, Lee et al. showed a $1/(p+\epsilon)$ , $\epsilon>0$ ratio for a local search method (Lee et al., ).

Vondrak et al. proposed a continuous greedy algorithm with a $(1/c)(1-e^{-c})$ ratio for general matroids (Vondrák, 2010), where $c$ is the total curvature of the function. An augmentation of this method has been shown to obtain a $(1-c/e)$ -approximation for single matroids (Sviridenko et al., ), along with an analogue for supermodular minimization. We remark that, while it exhibits a better ratio, this comes with a corresponding increase in complexity of the algorithm.

1.1.3. Curvature-Based Ratios

Conforti & Cornuéjols (Conforti and Cornuéjols, 1984) introduced the idea of total curvature later used by Sviridenko et al. for their $(1-c/e)$ ratio.

Definition 1 (Total Curvature).

Given a monotone non-decreasing submodular function $f$ defined on a matroid $\mathcal{M}=(\mathcal{I},X)$ , the total curvature of $f$ is

[TABLE]

Using this definition, they arrived at a $1/(1+c)$ approximation for general matroids, which reduces to $\frac{1}{c}(1-e^{-c})$ for maximzation on uniform matroids. Recently, Wang et al. (Wang et al., 2014) extended this idea by introducing the elemental curvature $\alpha$ of a function $f$ :

Definition 2 (Elemental Curvature).

The elemental curvature of a monotone non-decreasing function $f$ is defined as

[TABLE]

where $f_{i}(S)=f(S\cup\{i\})-f(S)$ .

While the resulting ratio (Theorem 1.1) is not as clean as that of prior work, this ratio is well-defined for non-submodular functions.

Theorem 1.1 (Wang et al. (Wang

et al., 2014)).

For a monotone non-decreasing function $f$ defined on a $k$ -uniform matroid $\mathcal{M}$ , the greedy algorithm on $\mathcal{M}$ maximizing $f$ produces a solution satisfying

[TABLE]

where $S$ is the greedy solution, $S^{*}$ is the optimal solution, $A_{k}=\sum_{i=1}^{k-1}\alpha^{i}$ and $\alpha$ is the elemental curvature of $f$ .

Corollary 1.2 (Wang et al. (Wang

et al., 2014)).

When $\alpha=1$ , the ratio given by Theorem 1.1 converges to $1-1/e$ as $k\rightarrow\infty$ .

However, the ratios produced based on the elemental curvature rapidly converge to [math] for non-submodular functions. This behavior is shown in Figure 1. Even for $k=25$ , the ratio is effectively zero and therefore uninformative. In contrast, we show that our ratio produces significant bounds for two non-submodular functions, while still converging to the $1-1/e$ ratio for submodular functions.

2. A Ratio for $f$ Non-Submodular

In this section, we introduce a further extension to the notion of curvature: primal curvature. We derive a bound based on this, prove its equivalence to $1-1/e$ for submodular functions. Then, we extend the ratio to the adaptive case, which allows direct application to a number of problems modeled under incomplete knowledge. We adopt a problem definition similar to that of Wang et al. Specifically, our ratio applies to any problem that can be phrased as $k$ -Uniform Matroid Maximization.

Problem 1 ( $k$ -Uniform Matroid Maximization).

Given a $k$ -uniform matroid $\mathcal{M}=(X,\mathcal{I})$ and a monotone non-decreasing function $f:2^{X}\rightarrow\mathbb{R}$ , find

[TABLE]

2.1. Construction of the Ratio

As noted previously, the ratio given by elemental curvature rapidly converges to zero for non-submodular functions. We observe that this is due to the definition of $\alpha$ encoding the worst-case potential, and address this limitation by introducing the primal curvature of a function. Our definition separates the notion of rate-of-change from the global perspective imposed by elemental curvature.222The term primal is adopted primarily to distinguish this definition from prior work.

Definition 3 (Primal Curvature).

The primal curvature of a set function $f$ is defined as

[TABLE]

The global maximum primal curvature is equivalent to the elemental curvature of a function.

This shift from global to local perspective allows focus on the patterns present in real-world problem instances rather than limiting our attention to the worst-case scenarios.

A key observation of Wang et al’s work is that the elemental curvature defines an upper bound on the change between $f(S)$ and $f(T)$ , for some $S\subset T$ , in terms of $\alpha$ and the marginal gain at $S$ . The definition of primal curvature improves on this, giving an equivalence in terms of the total primal curvature $\Gamma$ .

Definition 4 (Total Primal Curvature).

The total primal curvature of $x\in X$ between two sets $S\subseteq T\subset X$ with $x\not\in T$ is

[TABLE]

where the $t_{j}$ ’s form an arbitrary ordering of $T\setminus S$ and $r=|T\setminus S|$ .

We note that $\Gamma$ can be interpreted as the total change in the marginal value of $x$ from point $A$ to point $B$ . The following lemma illustrates this, as well as providing a useful identity.

Lemma 2.1.

[TABLE]

Proof.

First, expand the product into its constituent terms:

[TABLE]

After cancelling, the statement immediately follows. ∎

From this identity, we gain one further insight: the order in which elements are considered in $\Gamma$ does not matter.

Corollary 2.2.

The product $\Gamma(x\mid T,S)$ is order-independent.

Using this, we can prove an equivalence between the change in total benefit and the sum of marginal gains taken with respect to $S$ .

Lemma 2.3.

For a set function $f$ and a pair of sets $S\subseteq T$ ,

[TABLE]

where $r=|T\setminus S|$ , $f_{x}(S)=f(S\cup\{x\})-f(S)$ is the marginal gain and $S_{j-1}=S\cup\{j_{1},j_{2},\ldots j_{i-1}\}$ .

Proof.

Let $j_{1}$ be an arbitrary labeling of $T\setminus S$ . Then we have:

[TABLE]

By the identity given in Lemma 2.1, we can write

[TABLE]

Noting that $S\cup S_{i}=S_{i}$ . Thus, the statement is proven. ∎

With this lemma, we can now construct the ratio.

Theorem 2.4.

For a monotone non-decreasing function $f:2^{X}\rightarrow\mathbb{R}$ , the greedy algorithm on a $k$ -uniform matroid $\mathcal{M}=(X,\mathcal{I})$ maximizing $f$ produces a solution satisfying

[TABLE]

where $S$ is the greedy solution, $S^{+}=S\cup\{g_{k+1}\}$ is the greedy solution for an identical problem if a $k+1$ -uniform supermatroid $\mathcal{M}^{+}$ of $\mathcal{M}$ is well-defined, $S^{*}$ is the optimal solution on $\mathcal{M}$ , and $\hat{\Gamma}(S)$ is an estimator satisfying:

[TABLE]

where $S_{t-1}=S\cup\{j_{1},j_{2},\ldots,j_{t-1}\}$

Proof.

To begin, note that $f(S^{*})\leq f(S^{*}\cup S)$ due to $f$ monotone non-decreasing. Then, by Lemma 2.3 we have:

[TABLE]

We observe that any ratio that requires knowing $S^{*}$ is of little practical value: if $S^{*}$ is known, we can simply compute $f(S)/f(S^{*})$ . Therefore, we relax our assumptions in three key ways to go from Eqn. (2), which assumes that we know $S^{*}$ exactly, to Eqn. (1), which requires no knowledge of the optimal.

First, we partly remove the assumption on knowledge of $j_{t}\in S^{*}$ by substituting $f_{j_{t}}(S)$ with $f_{g_{k+1}}(S)$ , where $g_{k+1}=\operatorname*{arg\,max}_{x}f_{x}(S)$ .

[TABLE]

Next, we apply the upper bound $\hat{\Gamma}$ as defined above to both remove the remaining dependence on knowledge of $j_{t}$ and to eliminate the requirement of knowing $|S^{*}\setminus S|$ .

[TABLE]

Then, rearranging terms we get

[TABLE]

where $S^{+}=S\cup\{g_{k+1}\}$ . Then, dividing through by $f(S)$ and cross-multiplying, we get:

[TABLE]

∎

When compared to traditional approximation ratios, this ratio has several obvious differences. First, it has dependencies on both the greedy solution and an extension of it to $k+1$ elements. This is both a strength and fundamental limitation of Theorem 2.4: it takes into account how much the greedy solution has converged toward negligible marginal gains, but also inhibits general analysis over all potential problem instances. Further, it requires that the supermatroid $\mathcal{M}^{+}$ be well-defined, though we remark that this is generally not a problem. In practice, most problems solved with greedy algorithms are $k$ -element solutions on $n$ -element spaces, with $k$ typically much less than $n$ .

2.2. Equivalence to the $1-1/e$ Ratio

We next show that under assumptions encoding the submodularity condition, the above is equivalent to the $1-1/e$ ratio as $k\rightarrow\infty$ .

Lemma 2.5.

Given a $\hat{\Gamma}$ satisfying $\forall G:\hat{\Gamma}\geq\hat{\Gamma}(G)$ , the greedy algorithm produces a $k$ -element solution $S$ satisfying

[TABLE]

Proof.

We begin with Eqn. (3):

[TABLE]

for each $l\leq k$ , where $S_{l}$ denotes the $l$ -element greedy solution. Substitute $\hat{\Gamma}$ for $\hat{\Gamma}(S_{l})$ . Multiplying both sides by $(1-\hat{\Gamma}^{-1})^{k-l}$ and summing from $l=1$ to $l=k$ . The left-hand side becomes:

[TABLE]

To obtain the right-hand side, separate $f(S_{l})=\sum_{i=1}^{l}f_{g_{i}}(S_{i-1})$ into the marginal gain terms to produce the following in the body of the summation:

[TABLE]

Summing this over $l$ and employing the identity of the geometric series, this reduces to $\hat{\Gamma}f(S_{k})=\hat{\Gamma}f(S)$ on the right-hand side. Thus, we obtain the relation

[TABLE]

∎

Corollary 2.6.

For a submodular monotone non-decreasing function $f$ , the following relation holds as $k\rightarrow\infty$ :

[TABLE]

Proof.

For a submodular function, the primal curvature of any two elements $u,v$ at any point $T$ satisfies $\nabla(u,v\mid T)\leq 1$ by the definition of submodularity. Thus, we obtain directly that $\hat{\Gamma}=k$ satisfies the requisite relation. Then, the limit of $(1-\hat{\Gamma}^{-1})^{k}=(1-\frac{1}{k})^{k}$ as $k\rightarrow\infty$ is $1/e$ , leading directly to the statement above. ∎

Thus, we see that this ratio is a generalization of the classical $1-1/e$ approximation ratio that allows specialization of a ratio to the particular kind of problem instances being operated on. Further, the definition of total primal curvature illuminates why this ratio is capable of producing more useful bounds for non-submodular objectives than that of Wang et al: the $\Gamma$ values encode a product of values that may converge to a limit, depending on problem instance, while the $\alpha$ bound uses $\prod_{t=0}^{i}\alpha=\alpha^{i}$ which does not converge for any $\alpha>1$ (a condition which is implied by non-submodularity).

2.3. The Adaptive Ratio

We conclude this section by extending this ratio to the adaptive case where the decision made at each greedy step takes into account the outcomes of previous decisions. Briefly: in an adaptive algorithm, at each step the algorithm has a partial realization $\psi$ consistent with the true realization $\Phi$ (Golovin and Krause, 2011). After each step, this partial realization is updated with the outcome of that step to form $\psi^{\prime}$ . The method for deciding the steps to take is termed a policy, with the greedy algorithm encoded as the greedy policy.

This representation supports the study of algorithms that operate with incomplete information and gradual revelation of the data. The initial motivation was described in terms of placement of sensors that may fail, and this technique has seen further use in studying networks with incomplete topology (Li et al., 2016; Seeman and Singer, ), active learning under noise (Golovin et al., ), and distributed representative subset mining (Mirzasoleiman et al., ).

We generalize our ratio to this case by defining the adaptive primal curvature of a function in terms of the partial realizations.

Definition 5 (Adaptive Primal Curvature).

The primal curvature of an adaptive monotone non-decreasing function $f$ is

[TABLE]

where $S(j)$ is the set of possible states of $j$ and $\Delta$ is the conditional expected marginal gain (Golovin and Krause, 2011).

Definition 6 (Adaptive T.P.C.).

Let $\psi\subset\psi^{\prime}$ and $\psi\rightarrow\psi^{\prime}$ represent the set of possible state sequences leading from $\psi$ to $\psi^{\prime}$ . Then the adaptive total primal curvature is

[TABLE]

This definition leads to the following theorem by similar arguments as Thm. 2.4. However, the operations within expectation require additional care.

Lemma 2.7.

[TABLE]

Proof.

Fix a sequence $Q\in\psi\rightarrow\psi^{\prime}$ of length $r$ . Then, expanding the product we obtain

[TABLE]

If we take the expectation of this w.r.t. the possible sequences $Q$ , we obtain the same ratio regardless of $Q$ , and therefore the claim holds trivially. ∎

Corollary 2.8.

Suppose that $\forall\psi^{\prime}\supset\psi,i\not\in\text{dom}(\psi^{\prime}):\Gamma(i\mid\psi^{\prime},\psi)\leq\hat{\Gamma}(\psi)$ . Then

[TABLE]

where $\psi$ is the partial realization resulting from application of the $l$ -element greedy policy, $\psi\subset\psi^{\prime}$ , $i\not\in\text{dom}(\psi^{\prime})$ , and $g_{l+1}$ is the next element that would be selected by the greedy policy.

Proof.

By Lemma 2.7,

[TABLE]

and thus the statement holds. ∎

Lemma 2.9.

[TABLE]

where $\pi_{l}$ is the $l$ -truncation of $\pi$ with $l<k$ , $\pi^{\prime}$ selects exactly $k$ elements, $\hat{\Gamma}(\pi_{l})=\max_{\psi=\pi_{l}(\mathbf{\Phi})}\hat{\Gamma}(\psi)$ is the maximum over all possible realizations resulting from applying policy $\pi_{l}$ , and $\Delta_{avg}(\pi_{l},\pi_{l+1})=f_{\text{avg}}(\pi_{l+1})-f_{\text{avg}}(\pi_{l})$ .

Proof.

By Corollary 2.8, we have

[TABLE]

where the first equality uses the definition $\hat{\Gamma}(\psi)\leq\hat{\Gamma}(\pi_{l})$ and the second uses the definition of $\Delta(\cdot)$ . ∎

Theorem 2.10.

Define $\hat{\Gamma}_{k}(\pi)=\max_{0\leq l\leq k}\hat{\Gamma}(\pi_{l})$ . Then

[TABLE]

Proof.

By Lemma 2.9, we have

[TABLE]

Multiply both sides by $(1-(k\hat{\Gamma}_{k}(\pi))^{-1})^{k-1-l}$ and sum from $l=0$ to $k-1$ . We get that the left hand side reduces to

[TABLE]

and the right hand side reduces to $k\hat{\Gamma}_{k}(\pi)f_{\text{avg}}(\pi_{k})$ by employing the identity for partial sums of a geometric series to find that each term of the outer sum has coefficient $k\hat{\Gamma}_{k}(\pi)$ . Combining these, we directly obtain the statement of the theorem. ∎

3. Conclusion & Future Work

In this paper, we presented a method for estimating the approximation ratio of greedy maximization that works transparently for both submodular and non-submodular functions, in addition to a variant supporting adaptive greedy algorithms. This ratio reduces to at worst $1-1/e$ as $k\rightarrow\infty$ for submodular functions, and is shown to provide performance bounds for non-submodular maximization.

While we have demonstrated the utility of our technique for understanding the performance of non-submodular maximization, there remains room for further development. Relaxations of the uniformity and monotonicity conditions have found widespread use for submodular functions, and we expect that relaxing them for this ratio would likewise be generally useful.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2(2) Christian Borgs, Michael Brautbar, Jennifer Chayes, and Brendan Lucier. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms (2014) (SODA ’14) . SIAM.
3(3) Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu. The Socialbot Network: When Bots Socialize for Fame and Money. In Proceedings of the 27th Annual Computer Security Applications Conference (2011) (ACSAC ’11) . ACM, 93–102.
4(4) N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz. A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS) (2012-10).
5Conforti and Cornuéjols (1984) Michele Conforti and Gérard Cornuéjols. 1984. Submodular Set Functions, Matroids and the Greedy Algorithm: Tight Worst-Case Bounds and Some Generalizations of the Rado-Edmonds Theorem. Discrete Applied Mathematics 7 (1984).
6(6) Marshall L. Fisher, George L. Nemhauser, and Laurence A. Wolsey. An Analysis of Approximations for Maximizing Submodular Set functions—II. In Polyhedral Combinatorics . Springer, 73–87.
7Golovin and Krause (2011) Daniel Golovin and Andreas Krause. 2011. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization. Journal of Artificial Intelligence Research 42 (2011), 427–486.
8(8) Daniel Golovin, Andreas Krause, and Debajyoti Ray. Near-Optimal Bayesian Active Learning with Noisy Observations. In Advances in Neural Information Processing Systems 23 , J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 766–774.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Deterministic & Adaptive Non-Submodular Maximization

Abstract.

1. Introduction

1.1. Background & Related Work

1.1.1. Constraints on the 1−1/e1-1/e1−1/e Ratio

1.1.2. Alternate Problems & Algorithms

1.1.3. Curvature-Based Ratios

Definition 1 (Total Curvature).

Definition 2 (Elemental Curvature).

Theorem 1.1 (Wang et al. (Wang

Corollary 1.2 (Wang et al. (Wang

2. A Ratio for fff Non-Submodular

Problem 1 (kkk-Uniform Matroid Maximization).

2.1. Construction of the Ratio

Definition 3 (Primal Curvature).

Definition 4 (Total Primal Curvature).

Lemma 2.1.

Proof.

Corollary 2.2.

Lemma 2.3.

Proof.

Theorem 2.4.

Proof.

2.2. Equivalence to the 1−1/e1-1/e1−1/e Ratio

Lemma 2.5.

Proof.

Corollary 2.6.

Proof.

2.3. The Adaptive Ratio

Definition 5 (Adaptive Primal Curvature).

Definition 6 (Adaptive T.P.C.).

Lemma 2.7.

Proof.

Corollary 2.8.

Proof.

Lemma 2.9.

Proof.

Theorem 2.10.

Proof.

3. Conclusion & Future Work

1.1.1. Constraints on the $1-1/e$ Ratio

2. A Ratio for $f$ Non-Submodular

Problem 1 ( $k$ -Uniform Matroid Maximization).

2.2. Equivalence to the $1-1/e$ Ratio