Beyond Submodular Maximization via One-Sided Smoothness

Mehrdad Ghadiri; Richard Santiago; Bruce Shepherd

arXiv:1904.09216·cs.DS·June 3, 2020

Beyond Submodular Maximization via One-Sided Smoothness

Mehrdad Ghadiri, Richard Santiago, Bruce Shepherd

PDF

TL;DR

This paper extends the multilinear framework for submodular maximization to a broader class of functions, introducing a new parameter called one-sided smoothness, and provides improved approximation algorithms for diversity maximization problems under matroid constraints.

Contribution

It introduces the concept of one-sided smoothness for functions, extending the multilinear framework, and develops new approximation algorithms with better bounds for diversity maximization.

Findings

01

Achieves an rac{1}{\sigma} approximation for monotone, normalized one-sided rac{1}{\sigma}-smooth functions.

02

Provides an rac{1}{\sigma^{3/2}} approximation for rac{1}{\sigma}-semi-metric diversity functions under matroid constraints.

03

Develops a polynomial-time algorithm for multilinear one-sided rac{1}{\sigma}-smooth functions.

Abstract

The multilinear framework has achieved the breakthrough $1 - 1/ e$ approximation for maximizing a monotone submodular function subject to a matroid constraint. This framework has a continuous optimization part and a rounding part. We extend both parts to a wider array of problems. In particular, we make a conceptual contribution by identifying a family of parameterized functions. As a running example we focus on solving diversity problems $max f (S) = \frac{1}{2} \sum_{i, j \in A} A_{ij} : S \in M$ , where $M$ is a matroid. These diversity functions have $A_{ij} \geq 0$ as a measure of dissimilarity of $i, j$ , and $A$ has $0$ -diagonal. The multilinear framework cannot be directly applied to the multilinear extension of such functions. We introduce a new parameter for functions $F \in C^{2}$ which measures the approximability of the associated problem $max {F (x) : x \in P}$ , for…

Equations226

\frac{1}{2} u^{T} \nabla^{2} F (x) u \leq σ \cdot \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} u^{T} \nabla F (x),

\frac{1}{2} u^{T} \nabla^{2} F (x) u \leq σ \cdot \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} u^{T} \nabla F (x),

f (S) = \frac{1}{2} u, v \in S \sum A (u, v) + v \in S \sum b (v)

f (S) = \frac{1}{2} u, v \in S \sum A (u, v) + v \in S \sum b (v)

u^{T} \nabla F (x + ϵ u) \leq (\frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}})^{2 σ} (u^{T} \nabla F (x)) .

u^{T} \nabla F (x + ϵ u) \leq (\frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}})^{2 σ} (u^{T} \nabla F (x)) .

g^{'} (t) = u^{T} \nabla^{2} F (x + t u) u \leq 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} u^{T} \nabla F (x + t u) = 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} g (t) \leq 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} (g (t) + h),

g^{'} (t) = u^{T} \nabla^{2} F (x + t u) u \leq 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} u^{T} \nabla F (x + t u) = 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} g (t) \leq 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} (g (t) + h),

\frac{g ^{'} ( t )}{g ( t ) + h} \leq 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} .

\frac{g ^{'} ( t )}{g ( t ) + h} \leq 2 σ \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x + t u ∣ ∣ _{1}} .

\int_{0}^{\epsilon}\frac{g^{\prime}(t)}{g(t)+h}dt=\ln(g(t)+h)\biggl{|}_{0}^{\epsilon}=\ln(\frac{g(\epsilon)+h}{g(0)+h}),

\int_{0}^{\epsilon}\frac{g^{\prime}(t)}{g(t)+h}dt=\ln(g(t)+h)\biggl{|}_{0}^{\epsilon}=\ln(\frac{g(\epsilon)+h}{g(0)+h}),

2\sigma\int_{0}^{\epsilon}\frac{||u||_{1}}{||x+tu||_{1}}dt=2\sigma\ln(||x+tu||_{1})\biggl{|}_{0}^{\epsilon}=2\sigma\ln(\frac{||x+\epsilon u||_{1}}{||x||_{1}}),

2\sigma\int_{0}^{\epsilon}\frac{||u||_{1}}{||x+tu||_{1}}dt=2\sigma\ln(||x+tu||_{1})\biggl{|}_{0}^{\epsilon}=2\sigma\ln(\frac{||x+\epsilon u||_{1}}{||x||_{1}}),

O P T \leq F (x^{*} \lor x) = F (x) + u^{T} \nabla F (x + ϵ u) \leq F (x) + (\frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}})^{2 σ} u^{T} \nabla F (x),

O P T \leq F (x^{*} \lor x) = F (x) + u^{T} \nabla F (x + ϵ u) \leq F (x) + (\frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}})^{2 σ} u^{T} \nabla F (x),

\frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} \leq \frac{∣∣ x + u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} = 1 + \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} \leq 1 + \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ( 0 ) ∣ ∣ _{1}} \leq 1 + \frac{1}{α} = \frac{α + 1}{α} .

\frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} \leq \frac{∣∣ x + u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} = 1 + \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} \leq 1 + \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ( 0 ) ∣ ∣ _{1}} \leq 1 + \frac{1}{α} = \frac{α + 1}{α} .

v_{ma x} (x) \cdot \nabla F (x) \geq u^{T} \nabla F (x) \geq \frac{1}{( \frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} ) ^{2 σ}} (O P T - F (x)) \geq (\frac{α}{α + 1})^{2 σ} (O P T - F (x)) .

v_{ma x} (x) \cdot \nabla F (x) \geq u^{T} \nabla F (x) \geq \frac{1}{( \frac{∣∣ x + ϵ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}} ) ^{2 σ}} (O P T - F (x)) \geq (\frac{α}{α + 1})^{2 σ} (O P T - F (x)) .

\frac{d}{d t} F (x (t)) = \nabla F (x (t)) \cdot x^{'} (t) = \nabla F (x (t)) \cdot (1 - α) v_{ma x} (x (t)) \geq ρ (1 - α) [O P T - F (x (t))] .

\frac{d}{d t} F (x (t)) = \nabla F (x (t)) \cdot x^{'} (t) = \nabla F (x (t)) \cdot (1 - α) v_{ma x} (x (t)) \geq ρ (1 - α) [O P T - F (x (t))] .

\frac{d}{d t} [e^{ρ (1 - α) t} \cdot F (x (t))]

\frac{d}{d t} [e^{ρ (1 - α) t} \cdot F (x (t))]

e^{ρ (1 - α) t} \cdot F (x (t)) - e^{0} \cdot F (x (0))

e^{ρ (1 - α) t} \cdot F (x (t)) - e^{0} \cdot F (x (0))

F (x (t))

F (x (t))

O P T \leq F (x^{*} \lor x) \leq F (x) + u^{T} \nabla F (x) + \frac{1}{2} u^{T} \nabla^{2} F (x) u \leq F (x) + (1 + σ \cdot \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}}) u^{T} \nabla F (x)

O P T \leq F (x^{*} \lor x) \leq F (x) + u^{T} \nabla F (x) + \frac{1}{2} u^{T} \nabla^{2} F (x) u \leq F (x) + (1 + σ \cdot \frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}}) u^{T} \nabla F (x)

i \sum λ_{i}^{2} \cdot \mathbbm 1_{co v (B_{i})} + i < j \sum λ_{i} λ_{j} \cdot (2 \cdot \mathbbm 1_{co v (B_{i} \cap B_{j})} + \mathbbm 1_{δ (B_{i} - B_{j}, B_{j} - B_{i}, B_{i} \cap B_{j})}) .

i \sum λ_{i}^{2} \cdot \mathbbm 1_{co v (B_{i})} + i < j \sum λ_{i} λ_{j} \cdot (2 \cdot \mathbbm 1_{co v (B_{i} \cap B_{j})} + \mathbbm 1_{δ (B_{i} - B_{j}, B_{j} - B_{i}, B_{i} \cap B_{j})}) .

I \in I^{i, j} \sum \mathbbm 1_{co v (I)} \geq 2 \cdot \mathbbm 1_{co v (B_{i} \cap B_{j})} + \mathbbm 1_{δ (B_{i} - B_{j}, B_{j} - B_{i}, B_{i} \cap B_{j})} .

I \in I^{i, j} \sum \mathbbm 1_{co v (I)} \geq 2 \cdot \mathbbm 1_{co v (B_{i} \cap B_{j})} + \mathbbm 1_{δ (B_{i} - B_{j}, B_{j} - B_{i}, B_{i} \cap B_{j})} .

x^{T} D x

x^{T} D x

d (i, j) \leq d (i, k) + d (j, k) .

d (i, j) \leq d (i, k) + d (j, k) .

d (i, j) \leq d (i, k) + d (j, k) + 2 d (i, k) d (j, k) .

d (i, j) \leq d (i, k) + d (j, k) + 2 d (i, k) d (j, k) .

d (i, k) + d (j, k) - 2 d (i, k) d (j, k) = (d (i, k) - d (j, k))^{2} \geq 0.

d (i, k) + d (j, k) - 2 d (i, k) d (j, k) = (d (i, k) - d (j, k))^{2} \geq 0.

d (i, j) \leq 2 (d (i, k) + d (j, k)) .

d (i, j) \leq 2 (d (i, k) + d (j, k)) .

∣∣ x ∣ ∣_{1} \nabla_{ij}^{2} F (x) \leq σ (\nabla_{i} F (x) + \nabla_{j} F (x)),

∣∣ x ∣ ∣_{1} \nabla_{ij}^{2} F (x) \leq σ (\nabla_{i} F (x) + \nabla_{j} F (x)),

u^{T} \nabla^{2} F (x) u

u^{T} \nabla^{2} F (x) u

= \frac{σ}{∣∣ x ∣ ∣ _{1}} (i = 1 \sum n j = 1 \sum n u_{i} u_{j} \nabla_{i} F (x) + i = 1 \sum n j = 1 \sum n u_{i} u_{j} \nabla_{j} F (x))

= \frac{σ}{∣∣ x ∣ ∣ _{1}} (i = 1 \sum n u_{i} \nabla_{i} F (x) (j = 1 \sum n u_{j}) + i = 1 \sum n u_{i} (j = 1 \sum n u_{j} \nabla_{j} F (x)))

= \frac{σ}{∣∣ x ∣ ∣ _{1}} (∣∣ u ∣ ∣_{1} i = 1 \sum n u_{i} \nabla_{i} F (x) + ∣∣ u ∣ ∣_{1} j = 1 \sum n u_{j} \nabla_{j} F (x))

= 2 σ (\frac{∣∣ u ∣ ∣ _{1}}{∣∣ x ∣ ∣ _{1}}) (u^{T} \nabla F (x)) .

σ (\nabla_{i} F (x) + \nabla_{j} F (x))

σ (\nabla_{i} F (x) + \nabla_{j} F (x))

\geq k = 1 \sum n A (i, j) x_{k} = ∣∣ x ∣ ∣_{1} A (i, j) = ∣∣ x ∣ ∣_{1} \nabla_{ij}^{2} F (x),

α = \frac{- ( 2 σ + 1 ) + 4 σ ^{2} + 12 σ + 1}{2} .

α = \frac{- ( 2 σ + 1 ) + 4 σ ^{2} + 12 σ + 1}{2} .

g^{'} (α) = \frac{2 σ α ^{2 σ - 1} ( α + 1 ) ^{2 σ} - ( 2 σ + 1 ) α ^{2 σ} ( α + 1 ) ^{2 σ} - 2 σ ( α + 1 ) ^{2 σ - 1} α ^{2 σ} + 2 σ ( α + 1 ) ^{2 σ - 1} α ^{2 σ + 1}}{( α + 1 ) ^{4 σ}} = 0

g^{'} (α) = \frac{2 σ α ^{2 σ - 1} ( α + 1 ) ^{2 σ} - ( 2 σ + 1 ) α ^{2 σ} ( α + 1 ) ^{2 σ} - 2 σ ( α + 1 ) ^{2 σ - 1} α ^{2 σ} + 2 σ ( α + 1 ) ^{2 σ - 1} α ^{2 σ + 1}}{( α + 1 ) ^{4 σ}} = 0

\Rightarrow 2 σ α^{2 σ - 1} (α + 1)^{2 σ - 1} - 2 σ α^{2 σ} (α + 1)^{2 σ - 1} = α^{2 σ} (α + 1)^{2 σ}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Beyond Submodular Maximization via One-Sided Smoothness

Mehrdad [email protected], Georgia Institute of Technology, Atlanta, GA, USA.

Richard [email protected], ETH Zurich, Switzerland.

Bruce [email protected], University of British Columbia, Vancouver, Canada.

Abstract

The multilinear framework was developed to achieve the breakthrough $1-1/e$ approximation for maximizing a monotone submodular function subject to a matroid constraint, which includes the submodular welfare problem as special case. This framework has a continuous optimization part (solving the multilinear extension of a submodular set function) and a rounding part (rounding a fractional solution to an integral one). We extend both parts so that the resulting generalized framework may be used on a wider array of problems. In particular, we make a conceptual contribution by identifying a family of parameterized functions and their applications. As a running example we focus on solving diversity problems $\max f(S)=\frac{1}{2}\sum_{i,j\in A}A_{ij}:S\in\mathcal{M}$ , where $\mathcal{M}$ is matroid. These diversity functions have $A_{ij}\geq 0$ as a measure of dissimilarity of $i,j$ , and $A$ has [math]-diagonal. This family of problems ranges from intractable problems such as densest $k$ -subgraph, to $\frac{1}{2}$ -approximable metric diversity problems. The multilinear extension $F$ of such diversity functions satisfies $\nabla^{2}F(x)=A\geq 0$ and hence the original multilinear framework (which assumes non-positive Hessians) does not directly apply. Instead we introduce a new parameter for functions $F\in{\bf C}^{2}$ which measures the approximability of the associated problem $\max\{F(x):x\in P\}$ , for solvable downwards-closed polytopes $P$ . A function $F$ is called one-sided $\sigma$ -smooth if $\frac{1}{2}u^{T}\nabla^{2}F(x)u\leq\sigma\cdot\frac{||u||_{1}}{||x||_{1}}u^{T}\nabla F(x)$ for all $u,x\geq 0$ , $x\neq 0$ . For $\sigma=0$ this class includes previously studied classes such as continuous DR-submodular functions, and much more. For the multlinear extension of a diversity function, we show that it is one-sided $\sigma$ -smooth whenever $A_{ij}$ forms a $\sigma$ -semi-metric.

We give an $\Omega(1/\sigma)$ -approximation for the continuous maximization problem of monotone, normalized one-sided $\sigma$ -smooth $F$ with an additional property: non-positive third order partial derivatives. Since the multilinear extension of a diversity function has this additional property we can apply the extended multilinear framework to this family of discrete problems. This requires new matroid rounding techniques for quadratic objectives. The result is a $\Omega(1/\sigma^{3/2})$ -approximation for maximizing a $\sigma$ -semi-metric diversity function subject to matroid constraint. This improves upon the previous best bound of $\Omega(1/\sigma^{2})$ and we give evidence that it may be tight. For general one-sided smooth functions, we show the continuous process gives an $\Omega(1/3^{2\sigma})$ -approximation, independent of $n$ . In this setting, by discretizing, we present a concrete poly-time algorithm for multilinear functions that satisfy the one-sided $\sigma$ -smoothness condition. We also describe a discretization for one-sided smooth functions with $L$ -Lipschitz gradients.

1 Introduction

In a breakthrough result, an optimal $1-1/e$ approximation was given for monotone submodular maximization subject to a matroid constraint [13, 45]. This resolved a long standing gap between the best known $1/2$ approximation [27] and $1-1/e$ lower bound [22]. It also provides a tight approximation for the submodular welfare problem [23]. A key insight was to use a continuous relaxation based on the multilinear extension (ME) $F$ of a set function $f:2^{[n]}\rightarrow\mathbb{R}_{\geq 0}$ . For $x\in[0,1]^{n}$ , $F(x)$ is defined as $E[f(R(x))]$ , where $R(x)$ is a random set with each $i$ being selected independently with probability $x_{i}$ . In particular, for $S\subseteq[n]$ , $F(\mathbbm{1}_{S})=f(S)$ . Thus a valid multilinear relaxation for a discrete problem $\max\{f(S):S\in\mathcal{M}\}$ is obtained: $\max\{F(x):x\in P_{\mathcal{M}}$ , where $P_{\mathcal{M}}=conv(\mathbbm{1}_{S}:S\in\mathcal{M})\}$ .

This framework has inspired a successful stream of research including on non-monotone submodular functions [12] and new ‘contention resolution’ rounding techniques for general polytopes [18]. In this work we make a conceptual contribution by identifying a family of parameterized set functions where an extension to the multilinear framework can be brought to bear. We also give several applications for this generalized framework.

Using this framework to solve a discrete problem requires two essential ingredients. First, algorithmic tools to find a good solution $x^{*}$ for the multilinear relaxation. Second, to be able to convert a solution $x^{*}$ into a set $S$ with $f(S)\approx F(x^{*})$ . As multilinear extensions are neither concave nor convex, it is not a priori clear that the fractional problem itself would be tractable. For monotone submodular functions, however, a gradient-based technique—called continuous greedy—is shown to provide a $1-1/e$ approximation [45]. This analysis relies on the fact that MEs of submodular functions have non-positive second derivatives. Functions $F\in{\bf C}^{2}$ with this property are called continuous DR-submodular [7] (cf. [43]). The rounding step for matroids relies on a different property of multilinear extensions. Namely, if $f$ is submodular, then $F$ is convex in any direction $\bm{e}_{i}-\bm{e}_{j}$ , where $\bm{e}_{i}$ denotes the characteristic vector of $\{i\}$ . This allows a lossless conversion to a discrete solution, for instance, by using pipage rounding. The combination of the fractional algorithm and rounding provides the $1-1/e$ approximation.

In this paper, we develop a wider scope for the multilinear framework and give evidence of its use in other applications. One motivating example is diversity maximization [36, 35, 46] which has applications in machine learning [49, 30], document aggregation [1], web search [40], recommender systems [47, 14], and many more. One widely used model is $\max\{f(S):|S|\leq k\}$ , where $f(S)=\frac{1}{2}\sum_{i,j\in S}A_{ij}$ for all $S\subseteq[n]$ . We refer to $f$ as a diversity function if $A\geq 0$ is symmetric and has [math]-diagonal. We think of $A_{ij}$ as measuring dissimilarity between items $i,j$ . This family of (supermodular) maximization problems ranges from challenging examples such as $k$ -densest subgraph, with best known approximation $\Omega(1/n^{0.25+\epsilon})$ [4, 38], to metric diversity ( $A_{ij}$ forms a metric) which is $\frac{1}{2}$ -approximable [33, 11]. Since the multilinear extension of a diversity function $f$ has Hessian $A$ which is non-negative (as opposed to non-positive), the standard $(1-1/e)$ -approximation from continuous greedy does not directly apply. One of our main messages is that the metric property in diversity maximization is intrinsic to the tractability of the multilinear relaxation.

To describe our extended multilinear framework we first discuss the fractional problem and later discuss rounding. We introduce a parameterized family of monotone, non-negative functions $F\in{\bf C}^{2}$ and then show that the parameter governs the approximability of the problem $\max\{F(x):x\in P\}$ , for downwards-closed polytopes $P$ . To achieve this we cannot directly rely on a crucial fact used in the analysis of the continuous greedy process: that the rate of change of $F$ at a point $x$ is at least the current deficit, defined as OPT $-F(x)$ . For MEs of submodular functions, this follows from $F$ being concave in non-negative directions, which in turn relies on $F$ ’s second derivatives being non-positive. Instead, we define a family of functions whose growth in non-negative directions is constrained by a parameter $\sigma$ . A function $F:\mathbb{R}_{\geq 0}^{n}\rightarrow\mathbb{R}$ in ${\bf C}^{2}$ is called one-sided $\sigma$ -smooth (or $\sigma$ -OSS for short) if it satisfies

[TABLE]

for all $u,x\geq 0$ , $x\neq 0$ .

The class of [math]-OSS functions already contains interesting and familiar functions. This includes the continuous DR-submodular functions, as their Hessians are non-positive [7, 6]; the DR-submodular form a superset of the functions originally considered for continuous greedy [45]. The [math]-OSS functions contain much more however, as we discuss later in Section 1.1. For all of these functions, the continuous greedy process returns a solution within $1-1/e$ of the optimum; in some cases, converting this into a concrete polytime algorithm requires additional assumptions.

For larger values of $\sigma$ , one example of $\sigma$ -smooth functions is the class of $\sigma$ -semi-metric diversity functions. Namely, the parameter $\sigma$ corresponds to the matrix $A$ being a $\sigma$ -semi-metric. This means that $A_{ik}\leq\sigma(A_{ij}+A_{jk})$ for all $i,j,k$ , see Proposition 3 in Appendix A. This captures diversity functions addressed in the literature, such as metric diversity [9] ( $\sigma=1$ ), and negative-type distances [16, 15] or Jensen-Shannon divergence which has been used to measure dissimilarity of probability measures (both have smoothness $\sigma=2$ ), see Appendix A.

Our main contribution to the fractional problem is to show that one-sided smoothness of a monotone, non-negative function governs the approximability of $\max\{F(x):x\in P\}$ , for downwards closed polytopes $P$ . This is reminiscent of how Lipschitz smoothness bounds convergence rates in convex optimization — see Appendix F for a discussion about the difference of Lipschitz smoothness and OSS. If $F$ additionally has non-positive third order partials, then we show that continuous greedy can be adapted to become a $\Omega(1/\sigma)$ -approximation, and we show this is tight. This class includes the discussed MEs for diversity maximization. We can combine this with new rounding techniques to obtain unified results for maximizing diversity functions over matroids. Unlike for submodular functions, this requires the best-of-two rounding methods. One is inspired by swap rounding, previously applied to the submodular case. The other extends the approximate integer decomposition framework [17] to handle the “pairwise terms” in diversity functions.

For general $\sigma$ -smooth functions, without any third order assumption, we can obtain an $\Omega(1/3^{2\sigma})$ approximation (independent of $n$ ) for the continuous greedy process. We can no longer use the 2nd order Taylor Polynomial since we do not have non-positivity of the third order error term. Instead we work with the 1st Order Taylor expansion but this requires a new upper bound on $u^{T}\nabla F(x+\epsilon u)$ , the directional derivative, in a neighbhourhood of $x$ . In the fully general setting we need a (strong) lower bound on $u^{T}\nabla F(x+\epsilon u)$ to make a concrete algorithm. However, for multilinear $\sigma$ -OSS functions a polytime algorithm is shown independent of any additional assumptions. We also consider discretization for one-sided smooth functions with Lipschitz s gradients, and for a class of [math]-smooth functions which are not continuous DR submodular Section G.1.

1.1 The Zero One-Sided Smooth Class

The class of [math]-OSS functions is interesting in its own right. For monotone, non-negative members of this family our results show that the continuous greedy process yields a $1-1/e$ approximation for the fractional problem. Obtaining a polytime algorithm (discretization) is not immediate but we can establish natural conditions on cases where this can be achieved. The general [math]-OSS family forms a very broad class of functions. For instance, it contains every concave function $F\in{\bf C}^{2}$ (even though our results are only tailored for the monotone, non-negative functions in this class). This means it also contains the continuous DR-submodular functions (Hessians are non-positive). This containment is proper since there are [math]-OSS functions with positive off-diagonal entries in their Hessian. It is interesting to compare with the related family of continuous submodular functions that has been developed in the context of minimization [2]. Continuous submodular functions are defined as having Hessians with non-positive off-diagonal entries, but they may have positive diagonal entries. In contrast, [math]-OSS must have non-positive diagonals but may have positive off-diagonals (cf. Appendix G.1) - see Figure 1.

The (general) [math]-OSS family can be defined as the functions $F\in{\bf C}^{2}$ for which $-\nabla^{2}F(x)$ is copositive for every $x$ 444A matrix $A$ is copositive if $u^{T}Au\geq 0$ for every $u\geq 0$ [20].. While recognition of copositive matrices is NP-hard [39], we propose a strategic procurement problem which is modelled as maximizing a quadratic functions $F(x)=\frac{1}{2}x^{T}(-A)x+b^{T}x$ where $A$ is a copositive matrix defined by the user. Note that this family of objectives are a generalization of concave quadratics. We refer to the resulting (fractional) maximization problem as diversified procurement discussed in Appendix G.1.

2 Our Results

Our results are of three types: 1) fractional approximations, 2) rounding, and 3) hardness results. These are presented in Sections 4, 5, and 6 respectively. In all of our results, we assume that the function is monotone and non-negative.

Fractional approximation. Our main result in this part (Theorem 1) shows that a modified version of the continuous greedy process gives a $(1-\exp{(-(1-\alpha)(\frac{\alpha}{\alpha+1})^{2\sigma})})$ -approximation for maximizing a non-negative, monotone, $\sigma$ -OSS function subject to a downwards-closed polytope, where $\alpha$ is an arbitrary number in $[0,1)$ . We remark that for $\sigma=0$ , our results recover the $(1-1/e)$ -approximation [45, 13] for maximizing the multilinear extension of a submodular function, by setting $\alpha=0$ . For $\alpha=0.5$ , our approximation is better than $\frac{0.5}{3^{2\sigma}+0.5}$ . For fixed $\sigma$ this gives a constant-factor approximation independent of $n$ . At present, we do not know the correct dependence on $\sigma$ . However, the dependence improves to linear with an additional assumption that third-order partials are non-positive. More precisely, we obtain a $(1-\exp{(-\frac{1}{4\sigma+2})})$ -approximation; see Theorem 2. As mentioned in Section 1 this gives a $\Omega(1/\sigma)$ approximation which is in fact tight within a constant factor (cf. Corollary 1 discussed below). One example of such functions are multilinear extensions of semi-metric diversity functions, i.e., whose Hessian is a $\sigma$ -semi-metric (discussed further in the rounding part).

The ‘algorithm’ described in the previous paragraph is a continuous-time process, and it is not immediately obvious that it can be implemented as a discrete algorithm. Some readers may wish to take it on faith that this is possible and skip ahead to the rounding results. There are actually some subtleties involved which require two distinct approaches. One of our methods works for multilinear functions, while the other works for general OSS functions but needs an additional parameter that governs the growth of the first order derivatives from below. A fuller discussion is in Appendix C.

Rounding. In this part, we consider maximizing set functions of the form

[TABLE]

where $A$ is a symmetric matrix with [math]-diagonal. If $A,b\geq 0$ , then these are the previously discussed diversity functions, but more generally we refer to these as discrete quadratics (aka second-order modular [34]) as their extensions are quadratic functions $F(x)=\frac{1}{2}x^{T}Ax+b^{T}x$ . Since their third derivatives are obviously [math], they ‘qualify’ for the $\Omega(1/\sigma)$ -approximation from the preceding section. Hence we have a $\Omega(1/\sigma)$ -approximation for maximizing $F$ over a matroid polytope $P_{\mathcal{M}}$ when $F$ is $\sigma$ -OSS (i.e., $A$ is $\sigma$ -semi-metric). In order to solve the discrete problem $\max\{f(S):S\in\mathcal{M}\}$ , we need to transform this fractional solution to a discrete one.

We present two different rounding procedures which combined lead to a rounding gap of $O(\min\{\frac{r}{c-2},1+\frac{\sigma}{r}\})$ , where $r$ denotes the rank of the matroid and $c$ the size of a smallest circuit. Surprisingly we show this is tight (see hardness part). Moreover, this yields an $O(\sqrt{\sigma})$ rounding gap independent of $r$ and $c$ (Theorem 3). Combining the modified continuous greedy algorithm with our rounding result, there is an $\Omega(1/\sigma^{3/2})$ -approximation for maximizing $\sigma$ -semi-metric diversity functions subject to a matroid constraint (Theorem 4). This improves the best known $\Omega(1/\sigma^{2})$ bound [48]. In addition, we note that $\Omega(1/\sigma^{3/2})$ is a pessimistic bound in general. For instance, for uniform matroids we have $c=r+1$ , which leads to an $O(1)$ rounding gap and hence an improved $\Omega(1/\sigma)$ ; as discussed below this is actually tight.

This $O(1)$ rounding gap implies that for a cardinality constraint, the approximation bound of the discrete problem is asymptotically the same as the bound for the continuous problem. Thus the continuous problem of maximizing a general multilinear quadratic function over the simplex $||x||_{1}\leq k$ , is as hard as solving the densest $k$ -subgraph problem (see Corollary 1). This is similar to the situation for continuous maximization of MEs of submodular functions. Such continuous hardness problems have received less attention, as remarked by De Klerk [19]: “approximation algorithms have been studied extensively for combinatorial optimization problems, but have not received the same attention for NP-hard continuous optimization problems.” We close this section by discussing our hardness results for the discrete problems.

Hardness. In this part, we show that the hardness of approximation is also governed by the smoothness parameter of the function. More specifically, in Theorem 7 we show that assuming the planted clique conjecture, for a constant $\sigma$ it is hard to approximate the maximum of a $\sigma$ -semi-metric diversity function subject to a cardinality constraint within a factor better than $2\sigma$ . We also show that for a super constant $\sigma$ , it is hard to find any constant factor approximation.

In Theorem 8 we give a lower bound of $\Omega(\min\{\frac{r}{c-2},\frac{\sigma}{r}\})$ for the rounding gap of a $\sigma$ -semi-metric diversity function over a matroid polytope. This shows that our rounding methods are essentially tight. In particular, each step of our algorithm for maximizing diversity functions (i.e., maximizing the continuous function and rounding) is tight. This leads us to speculate that the $\Omega(1/\sigma^{3/2})$ -approximation (Theorem 4) is asymptotically tight.

3 Related Work

We first discuss work related to solving the continuous problem in the multilinear framework. Other adaptations of the continuous greedy algorithm have been developed for applications to non-monotone submodular maximization [24, 21] and distributed maximization [3]. Another avenue aimed to generalize the class of ${\bf C}^{2}$ functions originally considered [45]. For instance Bach [2] develops minimization algorithms for the family of continuous submodular functions [37] defined on compact product subsets of $\mathbb{R}^{n}$ . A function $F\in{\bf C}^{2}$ is submodular if the off-diagonal entries of its Hessian are non-positive. This class is an extension of lattice submodular functions [44, 28] (a lattice is a poset closed under meet and join operations and hence these generalize submodular set functions). DR-submodularity is a restricted form of lattice submodular functions introduced for maximization [42, 43]. These generalize to continuous DR-submodular functions in ${\bf C}^{2}$ for which all entries of the Hessian are non-positive [7]. The continuous greedy algorithm has also been studied for maximization of these continuous functions. Discretization requires an additional bound on Lipschitz smoothness and then a $(1-1/e)$ -approximation can be achieved as step sizes approach [math]. This is done for maximizing a (monotone and non-monotone) DR-submodular function over a downwards-closed polytope [7, 6]. This is introduced as an alternative to multilinear extension which is more practical to evaluate, and for which a gradient-based algorithm leads to a $1/4$ approximation over downwards-closed polytopes [31].

As for discrete problems, after the introduction of the multilinear framework, there have been many developments. One highlight is the introduction of contention resolution schemes [17] which allow one to work with more general polytopes. An online version of this approach has also been developed [25] with applications in algorithmic game theory. In a different direction, the work of [26] gives a combinatorial local search $1-1/e$ approximation algorithm for maximizing monotone submodular functions over matroids.

The diversity maximization problem $\max\{f(S):|S|\leq k\}$ has proved extremely versatile for many applications, as noted in Section 1. On the algorithmic side, a greedy $\frac{1}{2}$ -approximation was devised in [33] and this was generalized to matroid constraints [10]. The latter was extended to yield a $\Omega(1/\sigma^{2})$ -approximation whenever the diversity costs $A_{ij}$ form a $\sigma$ -semi-metric [49]. That is, $A_{ik}\leq\sigma(A_{ij}+A_{jk})$ for all $i,j,k$ . A PTAS has also been developed when the $A_{ij}$ ’s are negative type distances [16, 15].

There is also work that extends set function maximization beyond submodularity. In [8] a greedy algorithm is shown to give good approximations for a family of set functions which are parameterized by curvature and submodularity ratio values. In [11], a $\frac{1}{2}$ -approximation is developed for the problem of maximizing the sum of a submodular function and a metric diversity function. A generalization of this function, called proportionally submodular functions, is considered in [10]. Another extension is to maximization of weakly submodular functions where non-negativity of the function is relaxed [32].

4 Fractional Approximation.

In this section, we first discuss a key property of one-sided smooth functions, which is the main tool in our analysis. This property asserts that for a point $x$ , the directional derivative at points close to $x$ is bounded by a factor of the directional derivative at $x$ .

We then present a variant of the continuous greedy process which we use for both general $\sigma$ -OSS functions and those that have non-positive third-order partial derivatives. We analyze this algorithm for both classes of functions. The discretization of the continuous greedy process is discussed in Appendix C.

4.1 Notations

We use $\{\bm{e}_{1},\ldots,\bm{e}_{n}\}$ to denote the standard basis of $\mathbb{R}^{n}$ and $[n]:=\{1,\ldots,n\}$ to refer to the ground set of a set function. We denote the $i$ ’th coordinate of a vector $x$ with $x_{i}$ . For a set $R\subseteq[n]$ , we denote by $\mathbbm{1}_{R}$ its characteristic vector. Given a vector $x$ we denote its support by $supp(x)$ , i.e., the set of non-zero coordinates of $x$ . For a matrix $A$ , we use $A_{ij}$ and $A(i,j)$ interchangeably to refer to the $i,j$ entry of $A$ .

4.2 A Key Property of One-Sided Smoothness

The following result describes a property of one-sided smoothness that plays a key role in the analysis of the algorithm. It enables us to bound the first order Taylor’s polynomial of the function.

Lemma 1.

Let $x\in[0,1]^{n}\setminus\{\vec{0}\}$ , $u\in[0,1]^{n}$ and $\epsilon>0$ such that $x+\epsilon u\in[0,1]^{n}$ . Let $F:[0,1]^{n}\rightarrow\mathbb{R}$ be a non-negative, monotone function which is one-sided $\sigma$ -smooth on $\{y|x+\epsilon u\geq y\geq x\}$ . Then

[TABLE]

Proof.

Let $g(t):=u^{T}\nabla F(x+tu)$ . By the Chain Rule we have $g^{\prime}(t)=u^{T}\nabla^{2}F(x+tu)u$ .

By one-sided $\sigma$ -smoothness on $\{y|x+\epsilon u\geq y\geq x\}$ , for any $0\leq t\leq\epsilon$ ,

[TABLE]

for any $h>0$ . Therefore, using that $g(t)+h>0$ for all $t$ (since $g(t)\geq 0$ ), we have

[TABLE]

We integrate both sides of (1) with respect to $t$ . On the left hand side we get

[TABLE]

and on the right hand side we get

[TABLE]

where we use that $||u||_{1}=\sum_{i}u_{i}=\frac{d}{dt}\sum_{i}(x_{i}+tu_{i})=\frac{d}{dt}||x+tu||_{1}$ .

Therefore $\ln(\frac{g(\epsilon)+h}{g(0)+h})\leq\sigma\ln(\frac{||x+\epsilon u||_{1}}{||x||_{1}})$ , and hence $g(\epsilon)+h\leq\left(\frac{||x+\epsilon u||_{1}}{||x||_{1}}\right)^{2\sigma}(g(0)+h).$ Since this holds for any $h>0$ taking the limit yields the desired result. ∎

4.3 Continuous Greedy and One-Sided $\sigma$ -Smoothness

We now provide an adaptation of the continuous greedy algorithm, originally introduced in [45]. Algorithm 1 is for maximizing a monotone $\sigma$ -OSS function over a polytime separable downward-close polytope. Unlike the classical continuous greedy, our algorithm starts from a non-zero point, which allows us to take advantage of Lemma 1. Because of this, we call our algorithm jump-start continuous greedy.

Theorem 1.

Let $F:[0,1]^{n}\to\mathbb{R}_{\geq 0}$ be a monotone $\sigma$ -OSS function. Let $\alpha\in[0,1)$ and $P$ be a polytime separable, downward-closed, polytope. If we run the jump-start continuous greedy process (Algorithm 1) then $x(1)\in P$ and $F(x(1))\geq[1-\exp{(-(1-\alpha)(\frac{\alpha}{\alpha+1})^{2\sigma})}]\cdot OPT$ where $OPT:=\max\{F(x):x\in P\}$ .

Proof.

The main idea of the proof is to show that moving in the $v_{max}$ direction guarantees a fractional progress equal to $(\frac{\alpha}{\alpha+1})^{2\sigma}(OPT-F(x))$ . Let $x^{*}\in P$ be such that $F(x^{*})=OPT$ . Also, let $x\in\{x(t):0\leq t\leq 1\}$ and $u=(x^{*}-x)\vee 0$ , i.e., $x^{*}\vee x=x+u$ (where $\vee$ denotes the component-wise maximum operation). We have by Taylor’s Theorem that for some $\epsilon\in[0,1]$ :

[TABLE]

where the last inequality follows from Lemma 1. By the choice of $x(0)$ we have that $||x(0)||_{1}\geq\alpha||w||_{1}$ for any $w\in P$ , and then since $u\in P$ and $x(t)$ is non-decreasing in each component (because $v_{max}$ is always non-negative) we also have

[TABLE]

By the choice of $v_{max}$ and above inequalities it follows that for any $x\in\{x(t):0\leq t\leq 1\}$ ,

[TABLE]

Let $\rho=(\frac{\alpha}{\alpha+1})^{2\sigma}$ . Then using chain rule, we have

[TABLE]

We solve the above differential inequality by multiplying by $e^{\rho(1-\alpha)t}$ .

[TABLE]

Integrating the LHS and RHS of the above equation between [math] and $t$ we get

[TABLE]

Hence

[TABLE]

where the last inequality follows from the fact that $F$ is non-negative. Substituting $t=1$ and $\rho=(\frac{\alpha}{\alpha+1})^{2\sigma}$ gives the desired result. ∎

In Proposition 4 in Appendix B we provide an explicit expression for the best value of $\alpha$ (in terms of $\sigma$ ) for Algorithm 1 when we are dealing with $\sigma$ -OSS functions.

As discussed in Section 2, if the third-order partial derivatives of $F$ are non-postive, then the approximation factor of Algorithm 1 improves to $\Omega(1/\sigma)$ .

Theorem 2.

Let $F:[0,1]^{n}\to\mathbb{R}_{\geq 0}$ be a monotone $\sigma$ -OSS function with non-positive third-order partial derivatives. Let $P$ be a polytime separable, downward-closed, polytope. If we run the jump-start continuous greedy process (Algorithm 1) with $\alpha=1/2$ , then $x(1)\in P$ and $F(x(1))\geq[1-\exp{(-\frac{1}{4\sigma+2})}]\cdot OPT\geq\frac{1}{4\sigma+3}\cdot OPT$ , where $OPT:=\max\{F(x):x\in P\}$ .

The main idea for proving Theorem 2 is to use the third-order Taylor’s polynomial and use the non-positivity of third-order partials and the defining property of $\sigma$ -OSS functions. More specifically, because the third-order partials are non-positive, we have

[TABLE]

Then using the fact that $||x||_{1}$ is large (because we start from a non-zero point), we can conclude that $v_{max}(x)\cdot\nabla F(x)\geq\Big{(}\frac{\alpha}{\alpha+\sigma}\Big{)}\Big{(}OPT-F(x)\Big{)}$ . This inequality is then used to derive the desired result. For details of the proof of Theorem 2, see Appendix B.

Algorithm 1 is a continuous process and in general, it cannot be implemented in finite time. Therefore, we give a discretization of this process. In Appendix C, we show that starting from $x^{0}=\alpha v^{*}$ and using the update rule $x^{t+\delta}=x^{t}+\delta\cdot(1-\alpha)\cdot v_{max}(x^{t})$ with the appropriate step size $\delta$ , we can recapture similar approximation factors. We present different results for the discretization which are very similar in nature. The first one asserts that, if $F$ is the multilinear extension of some set function $f$ , then using $\delta=O(1/n^{3})$ , the output of the discrete algorithm satisfies $F(x^{1})\geq(1-\exp(-\frac{1}{2}(1-\alpha)(\frac{\alpha}{\alpha+1})^{2\sigma}))(1-o(1))OPT$ . See Theorem 10 in Appendix C.

The second result states that for a function $F$ that satisfies $u^{T}\nabla F(x+\epsilon u)\geq\beta u^{T}\nabla F(x)$ for all $u,x\in P$ and $\epsilon\in[0,1]$ , using $\delta=1/\beta n$ , the output of the discrete algorithm satisfies $F(x^{1})\geq\Big{(}1-\exp{(-\beta(1-\alpha)(\frac{\alpha}{\alpha+1})^{2\sigma}}\Big{)}OPT$ . See Theorem 11 in Appendix C. Note that, for example, the functions with a non-negative Hessian satisfy the mentioned inequality with $\beta=1$ .

5 Rounding

Let $\mathcal{M}=([n],\mathcal{I})$ be a matroid and $P_{\mathcal{M}}$ be its polytope. In this section we study the integrality gap for a quadratic program: $\max F(x):x\in P_{\mathcal{M}}$ . Here $F$ is a non-negative, quadratic multilinear function $F(x)=\frac{1}{2}x^{T}Ax+b^{T}x$ such that $A,b\geq 0$ and $A$ is a symmetric, zero diagonal matrix.

There are unbounded gaps for such quadratic programmes even for graphic matroids if we allow parallel edges (see Theorem 8). Fortunately these large gaps transpire for a simple reason, namely when the matroids have very small circuits. We are able to obtain the following integrality gap upper bound.

Theorem 3 (Quadratic Integrality Gap over Matroids).

Let $f$ be a set function whose multilinear extension $F$ is $\sigma$ -OSS . Let $\mathcal{M}$ be a matroid of rank $r$ , minimum circuit size $c$ , and matroid polytope $P_{\mathcal{M}}$ . Then there is a polytime algorithm which given $x^{*}\in P_{\mathcal{M}}$ produces an integral vector $\mathbbm{1}_{I}\in P_{\mathcal{M}}$ such that $F(x^{*})\leq O(\min\{\frac{r}{c-2},1+\frac{\sigma}{r}\})f(I)\leq O(\sqrt{\sigma})f(I)$ .

Combining the continuous greedy methods with this rounding procedure we obtain the following result which improves upon the previous best bound of $\Omega(1/\sigma^{2})$ .

Theorem 4.

The problem of maximizing a $\sigma$ -semi-metric diversity function over a matroid admits a $\Omega(1/\sigma^{3/2})$ -approximation. For uniform matroids this becomes a $\Omega(1/\sigma)$ -approximation.

Theorem 3 is obtained by two different rounding algorithms. One is based on modifying the approximate integer decomposition property [17] to work for quadratic programs; the second one adapts the swap rounding algorithm developed for submodular functions [13]. We discuss the first result here. For details regarding the second method, see Appendix D. We remark that while our rounding results are inspired by previous techniques used for submodular maximization, the analysis requires several new insights to make it work for quadratic functions, since these are not convex in the $\bm{e}_{i}-\bm{e}_{j}$ directions.

Theorem 5.

Let $F$ be a non-negative, quadratic multilinear polynomial and $\mathcal{M}$ be a matroid with rank $r$ and minimum circuit size $c\geq 3$ . If $x^{*}\in P_{\mathcal{M}}$ , then there is an independent set $I$ of $\mathcal{M}$ such that $(3+\frac{2r}{c-2})F(\mathbbm{1}_{I})\geq F(x^{*})$ .

We actually prove the following decomposition result which implies Theorem 5. For $x^{*}\in P_{\mathcal{M}}$ , we define the coverage of a pair $u,v$ to be the quantity $x^{*}(u)x^{*}(v)$ . Let $Cov\in\mathbb{R}^{{n\choose 2}}$ be the vector with entries $Cov(u,v)=x^{*}(u)x^{*}(v)$ . As $F$ is quadratic it is linear in these coverage values and the vector $x^{*}$ : $F(x^{*})=\sum_{u\neq v}(\frac{A(u,v)}{2})Cov(u,v)+\sum_{v}b(v)x^{*}(v)$ . For a set $X$ we say its coverage set is $cov(X)=\{\{u,v\}:u,v\in X,u\neq v\}$ . A quadratic coverage of $x^{*}$ is a collection $\mathcal{C}=\{\mathbbm{1}_{I_{i}},\mu_{i}\}$ of weighted independent sets with properties (1) for each $u\neq v$ , $\sum_{i:\{u,v\}\subseteq cov(I_{i})}\mu_{i}\geq Cov(u,v)$ , and (2) for each $v$ , $\sum_{i:I_{i}\ni v}\mu_{i}\geq x^{*}(v)$ . Recall that $A,b\geq 0$ . It follows that $\sum_{i}\mu_{i}F(\mathbbm{1}_{I_{i}})\geq F(x^{*})$ and hence if the size $\sum_{i}\mu_{i}\leq K$ , then some $I_{i}$ satisfies $F(\mathbbm{1}_{I_{i}})\geq\frac{F(x^{*})}{K}$ . This bound depends on the fact that entries of $A$ are non-negative. By condition (1) of quadratic coverages, we have $\sum_{i}\mu_{i}\mathbbm{1}_{cov(I_{i})}\geq Cov$ and by condition (2), $\sum_{i}\mu_{i}\mathbbm{1}_{I_{i}}\geq x^{*}$ . Therefore, for such a collection we have $\sum_{i}\mu_{i}F(\mathbbm{1}_{I_{i}})\geq F(x^{*})$ . This reasoning shows that to deduce Theorem 5, it suffices to find a quadratic coverage with $\sum_{i}\mu_{i}\leq(3+\frac{2r}{c-2})$ .

Theorem 6.

Let $F(x)=\frac{1}{2}x^{T}Ax+b^{T}x$ be a non-negative, quadratic multilinear polynomial and $\mathcal{M}$ be a matroid with rank $r=r([n])$ and minimum circuit size $c\geq 3$ . If $x^{*}\in P_{\mathcal{M}}$ , then it has a quadratic coverage of size at most $3+\frac{2r}{c-2}$ .

Proof.

We start with an arbitrary representation of $x^{*}$ as a convex combination of independent sets: $\sum_{i}\lambda_{i}\mathbbm{1}_{B_{i}}$ .

First note that $Cov(u,v)=(\sum_{B_{i}\ni u}\lambda_{i})(\sum_{B_{j}\ni v}\lambda_{j})=\sum_{(i,j):B_{i}\ni u,B_{j}\ni v}\lambda_{i}\lambda_{j}$ . Hence an ordered pair $(B_{i},B_{j})$ contributes $\lambda_{i}\lambda_{j}$ to $Cov(u,v)$ if $u\in B_{i},v\in B_{j}$ . This implies that if $B_{i}=B_{j}$ , then this contributes exactly $\lambda_{i}^{2}$ for every $u,v\in B_{i}$ . If $B_{i}\neq B_{j}$ , then the unordered pair $\{B_{i},B_{j}\}$ contributes to coverages as follows. It contributes $2\lambda_{i}\lambda_{j}$ for every $u,v\in B_{i}\cap B_{j}$ and $\lambda_{i}\lambda_{j}$ for each $uv\in\delta(B_{i}-B_{j},B_{j}-B_{i},B_{i}\cap B_{j})$ . Here for disjoint node sets $X_{1},X_{2},\ldots,X_{p}$ we define $\delta(X_{1},X_{2},\ldots,X_{p})$ to be the set of edges which have endpoints in distinct sets from the $X_{i}$ ’s. Hence we can express the coverage vector $Cov$ for $x^{*}$ in $\mathbb{R}^{{n\choose 2}}$ as:

[TABLE]

We now define a quadratic coverage, that is, a weighted collection of independent sets satisfying conditions (1) and (2). In particular, for each $i\leq j$ we define a family of independent sets $\mathcal{I}^{i,j}$ which will take care of all coverages associated with terms $\lambda_{i}\lambda_{j}$ in (2). In the case where $i=j$ , this is easy. We just include the set $B_{i}$ with weight $\mu_{i}=\lambda_{i}^{2}$ . Now consider the case where $i<j$ which is trickier. For each set $I$ in this family, we always associate the weight $\mu_{I}=\lambda_{i}\lambda_{j}$ and so this amounts to finding a family which satisfies

[TABLE]

We return to this construction later but we note that condition (2) will follow easily as long as we guarantee that for each $v,i$ and $j\neq i$ , if $B_{i}\ni v$ , then the family $\mathcal{I}^{i,j}$ includes at least one set $I$ which contains $v$ . Since we have $\mu_{I}=\lambda_{i}\lambda_{j}$ for any such $I$ , we derive the desired inequality (2): $\sum_{I\ni v}\mu_{I}\geq\sum_{B_{i}\ni v}(\sum_{j}\lambda_{i}\lambda_{j})=\sum_{B_{i}\ni v}\lambda_{i}=x^{*}(v)$ .

If we can achieve this construction so that $|\mathcal{I}^{i,j}|\leq K$ for each $i,j$ , then we have a quadratic coverage whose size is $\sum_{i}\mu_{i}+\sum_{i<j}\sum_{I\in\mathcal{I}^{i,j}}\mu_{I}=\sum_{i}\lambda^{2}_{i}+\sum_{i<j}\lambda_{i}\lambda_{j}|\mathcal{I}^{i,j}|\leq\sum_{i}\lambda^{2}_{i}+\sum_{i<j}\lambda_{i}\lambda_{j}K\leq 1+K/2$ . The last inequality follows since the $\lambda_{i}$ are a convex combination.

We now define $\mathcal{I}^{i,j}$ for a fixed pair $i,j$ and show how to find the desired independent sets $\mathcal{I}^{i,j}=\{I^{i,j}_{k}:k=1,2,\ldots,K\}$ , where $K$ is defined later. First, if $|B_{i}\cap B_{j}|\geq 1$ , then we include the sets $B_{i},B_{j}$ . This takes care of the double-coverage of pairs in $B_{i}\cap B_{j}$ as well as any pairs $u,v$ with $u\in B_{i}\cap B_{j}$ and $v\in B_{i}\Delta B_{j}$ . Let $S_{ij}=B_{i}\setminus B_{j}$ and $S_{ji}=B_{j}\setminus B_{i}$ . Note that the excess coverage from these sets $B_{i},B_{j}$ is to contribute an extra $\lambda_{i}\lambda_{j}$ to each pair in $cov(S_{ij})\cup cov(S_{ji})$ . It now remains to cover the edges in $\delta(S_{ij},S_{ji})$ .

Let $t=\lfloor(c-1)/2\rfloor$ and $m=|B_{i}\cap B_{j}|\geq 0$ . Decompose $B_{j}\setminus B_{i}$ into $\ell=\lceil(r-m)/t\rceil$ disjoint independent sets by ripping out sets of size $t$ greedily, possibly the last being smaller than $t$ . Call these $C_{1},C_{2},\ldots,C_{\ell}$ . For each $k\leq\ell$ , we extend $C_{k}$ to an independent set $R^{i,j}_{k}$ in $B_{i}\Delta B_{j}$ only adding elements from $B_{i}\setminus B_{j}$ . Hence this set will have used all elements of $B_{i}$ except a subset, call it $Z_{k}$ , of size at most $t$ . Let $C^{i,j}_{k}=Z_{k}\cup C_{k}$ and note that $|C^{i,j}_{k}|\leq 2t\leq c-1$ and hence it is also independent. We now examine the pairs covered by $C^{i,j}_{k},R^{i,j}_{k}$ . Let $u\in C_{k},v\in B_{i}\setminus B_{j}$ , then either $u,v$ is covered by $R^{i,j}_{k}$ , or $v\in Z_{k}$ in which case it is covered by $C^{i,j}_{k}$ .

Finally, we count the number of sets for a given family. There are two cases depending on whether $B_{i}\cap B_{j}=\varnothing$ or not. If the intersection is empty, then we just build $2\lceil\frac{r}{t}\rceil$ . Since $t\geq\frac{c-2}{2}$ , this is at most $2\cdot(1+\frac{2r}{c-2})$ . In the other case we have $m\geq 1$ , and we add the sets $B_{i},B_{j}$ up front and then we add $2\lceil\frac{r-m}{t}\rceil$ more sets. Hence the overall number of sets in this case is at most $2+2\cdot(\frac{2r}{c-2}-\frac{2}{c-2}+1)$ .

It follows that $K\leq 2\cdot(2+\frac{2r}{c-2})$ , and thus we have a quadratic coverage of size at most $1+\frac{K}{2}\leq 3+\frac{2r}{c-2}$ , as we wanted to show. ∎

6 Hardness

It is shown that it is hard to approximate the maximum of a metric diversity function subject to a cardinality constraint within a factor better than $\frac{1}{2}$ [5, 11]. We generalize this hardness result to $\sigma$ -semi-metric diversity functions. The following result shows that our approximation factor for maximizing a $\sigma$ -semi-metric diversity function, subject to a uniform matroid (Theorem 4) is asymptotically tight. For the proof of the following theorem, see Appendix E. Let $\theta:=n^{1/(\log\log n)^{c}}$ where $c$ is a suitably chosen universal constant independent of $n$ .

Theorem 7.

Assuming the exponential time hypothesis (ETH): (1) There is no polytime $4/\theta$ -approximation algorithm for maximizing $\sigma$ -semi-metric diversity functions subject to a cardinality constraint, and (2) for any fixed $\sigma\geq 1$ and $\epsilon>0$ , there is no polytime algorithm which approximates the maximum of a $\sigma$ -semi-metric diversity function subject to a cardinality constraint within a factor of $2\sigma-\epsilon$ .

Combining Theorem 7 and the $O(1)$ rounding for multilinear quadratics subject to a uniform matroid (Theorem 5), gives the following result which states that the approximation bound given in Theorem 2, for the functions with a non-positive third-order partial derivatives, is asymptotically tight.

Corollary 1.

Let $A$ be a matrix corresponding to a $\sigma$ -semi-metric distance function. Then, assuming ETH, it is hard to approximate the continuous problem $\max x^{T}Ax:||x||_{1}\leq k$ within a factor of $o(\sigma)$ . Moreover this implies that the analysis of the jump-start continuous greedy algorithm in Theorem 2 is asymptotically tight.

This result is conditioned on hardness of densest subgraph which has been established under ETH [38] - see Appendix E. First, since the term $O(\frac{r}{c-2})$ in Theorem 3 does not depend on $\sigma$ , it yields an $O(1)$ rounding gap for cardinality constraints (since $\frac{r}{c-2}\approx 1$ ). In addition, given that the multilinear extension of the densest subgraph objective is of the form $x^{T}Ax$ , the approximability of densest subgraph is within a constant factor of its continuous relaxation.

The following result asserts that our rounding algorithm is also asymptotically tight. The proof is included in Appendix E.

Theorem 8.

Let $k,t\in\mathbb{N}$ with $1\leq t\leq k$ . There exists a $\sigma$ -semi-metric diversity function with multilinear extension $F$ , and a matroid $\mathcal{M}=([2k],\mathcal{I})$ with rank $r=k+t-1$ and minimum circuit size $c=2t$ , where the integrality gap of $F(x)$ over the matroid polytope $P_{\mathcal{M}}$ is $\Omega(\min\{\frac{r}{c-2},\frac{\sigma}{r}\})$ .

7 Conclusion

There are a number of directions which need exploring. The most immediate are (i) extending the continuous greedy algorithm to non-monotone $\sigma$ -smooth functions, (ii) develop rounding methods (such as contention resolution) for one-sided smooth functions over more general polytopes. We believe there should be further interesting applications for the one-sided smoothness model introduced in this work.

8 Acknowledgements

This article benefitted greatly from previous anonymous reviews. We are indebted to those reviewers as well as to Chandra Chekuri, Anupam Gupta and Nick Harvey who also provided invaluable feedback. The third author gratefully acknowledges the support from an NSERC Discovery Grant 109840 without which this work would not be possible.

Appendix A Appendix: Semi-metric diversity and OSS

In this section, we establish the smoothness parameter associated with several of the discrete quadratic functions discussed. In other words, we bound the approximate triangle inequality for their associated distance functions.

Definition 1.

Let $d:[n]\times[n]\rightarrow\mathbb{R}_{\geq 0}$ be a distance function with the corresponding distance matrix $D\in\mathbb{R}^{n\times n}_{\geq 0}$ where $D_{a,b}=d(a,b)$ . We say $d$ is a negative-type distance if for any $x\in\mathbb{R}^{n}$ with $||x||_{1}=0$ we have $x^{T}Dx\leq 0$ .

Proposition 1.

Any negative-type distance $d:[n]\times[n]\rightarrow\mathbb{R}_{\geq 0}$ is $2$ -semi-metric.

Proof.

Let $x=0.5e_{a}+0.5e_{b}-e_{c}$ . We know

[TABLE]

Therefore $d(a,b)\leq 2d(a,c)+2d(b,c)$ and $d$ is $2$ -semi metric. ∎

Jensen-Shannon Divergence is a function which measures dissimilarity between probability distributions. It is well-known that if $d$ is a JS measure, then $\sqrt{d}$ is a metric. Hence JS distances form a $2$ -semi-metric by the following result.

Proposition 2.

Let $d:[n]\times[n]\rightarrow\mathbb{R}_{\geq 0}$ be a distance function such that $\sqrt{d(\cdot,\cdot)}$ is a metric. Then $d(\cdot,\cdot)$ is a $2$ -semi-metric.

Proof.

By definition, we have

[TABLE]

Therefore,

[TABLE]

We also know that

[TABLE]

Hence,

[TABLE]

∎

Lemma 2.

Let $F\in{\bf C}^{2}$ , $x\in[0,1]^{n}$ and $\sigma\geq 0$ . If for any $i,j\in[n]$ we have

[TABLE]

then $F$ is one-sided $\sigma$ -smooth at $x$ .

Proof.

We have

[TABLE]

∎

We have defined a symmetric matrix $A$ to be a $\sigma$ -semi-metric (see Section 1) if $A_{ik}\leq\sigma(A_{ij}+A_{jk})$ for all $i,j,k$ . Our main applications are to multilinear extensions where $A$ is non-negative and has zero diagonal. However, the following result applies in the more general setting.

Proposition 3.

Let $A\in\mathbb{R}^{n\times n}$ be a non-negative symmetric matrix. Let $b\in\mathbb{R}^{n}$ and $b\geq 0$ . Then $F(x)=\frac{1}{2}x^{T}Ax+b^{T}x$ is one-sided $\sigma$ -smooth if $A$ is a $\sigma$ -semi-metric.

Proof.

Note that $\nabla^{2}F(x)=A$ and $\nabla F(x)=Ax+b$ . Therefore,

[TABLE]

where the first inequality follows from $b\geq 0$ and the last inequality holds because $A$ is $\sigma$ -semi-metric. Now by Lemma 2, we conclude that $F$ is one-sided $\sigma$ -smooth. ∎

Appendix B Appendix: Jump-Start Continuous Greedy

Proposition 4.

For any $\sigma>0$ the best approximation guarantee in Theorem 1 is attained at

[TABLE]

Proof.

We need to find the maximizer of $g(\alpha)=(1-\alpha)(\frac{\alpha}{\alpha+1})^{2\sigma}$ where $\alpha\in[0,1)$ . Hence, we solve $g^{\prime}(\alpha)=0$ .

[TABLE]

The only solution in $[0,1)$ is $\frac{-(2\sigma+1)+\sqrt{4\sigma^{2}+12\sigma+1}}{2}$ and this yields the proposition. ∎

Theorem 2.

Let $F:[0,1]^{n}\to\mathbb{R}_{\geq 0}$ be a monotone $\sigma$ -OSS function with non-positive third order partial derivatives. Let $\alpha\in[0,1)$ and $P$ be a polytime separable, downward-closed, polytope. If we run the jump-start continuous greedy process (Algorithm 1) then $x(1)\in P$ and $F(x(1))\geq[1-\exp{(-\frac{\alpha(1-\alpha)}{\alpha+\sigma})}]\cdot OPT$ where $OPT:=\max\{F(x):x\in P\}$ . In particular, taking $\alpha=1/2$ we get $F(x(1))\geq[1-\exp{(-\frac{1}{4\sigma+2})}]\cdot OPT$ and so $F(x(1))\geq\frac{1}{4\sigma+3}\cdot OPT$ (since $e^{x}\geq x+1$ for $x<1$ ).

Proof.

For each $t\in[0,1]$ we have

[TABLE]

Since $P$ is convex and $v^{*}\in P$ , we have that $x(t)\in P$ as long as $y(t):=\int_{0}^{t}v_{max}(x(\tau))\,d\tau\in P$ . Given that each $v_{max}(x(\tau))\in P$ and also $\vec{0}\in P$ , it follows that $y(t)$ is a convex combination of points in $P$ , and hence belongs to $P$ .

Let $x^{*}\in P$ be such that $F(x^{*})=OPT$ . Also let $x\in\{x(t):0\leq t\leq 1\}$ and $u=(x^{*}-x)\vee 0$ , i.e., $x^{*}\vee x=x+u$ . By Taylor’s Theorem and non-positivity of the third order derivatives of $F$ we have

[TABLE]

where the second inequality follows from smoothness, and the third from the fact that $||x(t)||\geq||x(0)||=\alpha||v^{*}||\geq\alpha||u||$ . Thus

[TABLE]

where the last inequality follows from monotonicity. We also have that

[TABLE]

where the first inequality follows by definition of $v_{max}$ and the fact that $x^{*}\in P$ , and the second inequality from the fact that $x^{*}\geq u$ and $\nabla F\geq 0$ . Combining this with (5) yields:

[TABLE]

for any $x\in\{x(t):0\leq t\leq 1\}$ . Let us denote $\rho=\alpha/(\alpha+\sigma)$ . We can use the Chain Rule to get

[TABLE]

where the last inequality follows from (6).

We solve the above differential inequality by multiplying by $e^{\rho(1-\alpha)t}$ .

[TABLE]

where the inequality follows from Equation (7).

Integrating the LHS and RHS of the above equation between [math] and $t$ we get

[TABLE]

Hence

[TABLE]

where the last inequality follows from the fact that $F$ is non-negative. Substituting $t=1$ and $\rho=\alpha/(\alpha+\sigma)$ gives the desired result. ∎

Appendix C Appendix: Discretization of the Continuous Greedy

We now discuss discretization of the continuous greedy process for one-sided smooth functions.

If our goal is to find a polytime approximation algorithm, we need to establish two features. The first is an approximation bound; for this we use our analysis of the continuous greedy process, Theorems 1 and 2. The second is some sort of smoothness assumption on the gradients of $F$ . We consider several conditions for the latter depending on the context; the most straightforward is for multilinear OSS functions.

To discretize the jump-start continuous greedy, we start at $x^{0}=\alpha v^{*}$ (for $\alpha\in[0,1)$ ) and use the following update rule.

[TABLE]

where $\delta$ is the step size, and $v_{max}(x):=\operatorname*{arg\,max}_{v\in P}v^{T}\nabla F(x)$ .

We always assume $\delta>0$ is chosen with $1/\delta$ integer, which is then clearly the number of iterations. Our main concern is to bound this by a polynomial in the input size. This is because we primarily adopt the view that we have exact access to the function $F$ and its gradients. This is the case for the ME of a diversity function (and in fact any quadratic function) which is our main application. For more general functions we may not have access to the exact gradient and we should find an estimate by sampling from the function. In that case, we need a probabilistic argument similar to the original argument of Vondrak [45].

There are two ingredients we need to analyze discretizations. One is an approximation bound for the continuous process itself. The second is a bound which guarantees that gradients do not decrease too suddenly. We describe the discretization as a self-contained argument which takes these two bounds ((8) and (9)) as inputs.

Let $F:[0,1]^{n}\rightarrow\mathbb{R}$ be a $\sigma$ -OSS function and $P$ be a downward-closed polytope. Denote $OPT:=\max_{x\in P}F(x)$ . We consider generic lower bounds on the continuous greedy rate of improvement as a function of $\alpha$ and $\sigma$ . For some $\mu=\mu(\alpha,\sigma)\in(0,1]$ , we say an application satisfies a $\mu$ * bound * if for any $x\in P$ such that $x\geq\alpha v^{*}$ we have

[TABLE]

The following lemma encapsulates the two main bounds $\mu$ we use; these are outcomes of the proofs of Theorem 1 and Theorem 2.

Lemma 3.

Let $F:[0,1]^{n}\rightarrow\mathbb{R}$ be a $\sigma$ -OSS function and $P$ be a downward-closed polytope. Denote $OPT:=\max_{x\in P}F(x)$ . Then for any $x\in P$ such that $x\geq\alpha v^{*}$ we have

[TABLE]

If in addition $F$ has non-positive third derivatives, then we have

[TABLE]

For $\eta\geq 0$ (possibly a function of inputs such as $n$ ), we say $F\in{\bf C}^{2}$ is $\eta$ -local at $x,u$ if

[TABLE]

for all $\epsilon\in[0,1]$ such that $x+\epsilon u\in P$ and where $1-\eta\epsilon\in[0,1]$ . The function is $\eta$ -local if this holds for all such choices of $x,u$ . The next result shows how one may obtain a polytime implementation of continuous greedy for functions with “bounded locality”. As discussed later, in some applications, functions may only be local for a subset of $x,u$ .

Theorem 9.

Let $F:[0,1]^{n}\rightarrow\mathbb{R}\in{\bf C}^{2}$ be a monotone, non-negative $\sigma$ -OSS function and $P$ a polytime separable downward-closed polytope. Assume $F$ satisfies a $\mu=\mu(\alpha,\sigma)$ bound and is $\eta$ -local. Then taking $\delta\leq\min\{\frac{1}{n\eta(1-\alpha)},\frac{1}{(1-\alpha)\mu}\}$ , discrete greedy produces $x^{1}$ satisfying:

[TABLE]

Proof.

By definition of the algorithm we have $x^{0}=\alpha v^{*}$ and $x^{t+\delta}=x^{t}+\delta\cdot(1-\alpha)\cdot v_{max}(x^{t})$ . Then by Taylor’s Theorem for some $\epsilon\in[0,1]$ we have

[TABLE]

where the first inequality follows from $\eta$ -locality and the second inequality follows by the $\mu$ bound property, and $\epsilon\in[0,1]$ . Now define $\tilde{OPT}=(1-\eta\delta(1-\alpha))OPT$ . We have

[TABLE]

because $F$ is non-negative and $(1-\eta\delta(1-\alpha))\in[0,1]$ by choice of $\delta$ . Hence we have

[TABLE]

By induction, we have

[TABLE]

Next, since $\delta\leq\frac{1}{(1-\alpha)\mu}$ , we have $(1-\delta(1-\alpha)\mu)^{1/\delta}\leq\exp(-(1-\alpha)\mu)$ . Therefore we have

[TABLE]

where the first inequality holds because of non-negativity of $F$ , and the last equality holds because $\delta\leq\frac{1}{n\eta(1-\alpha)}$ .

∎

We discuss how one may apply this theorem to functions $F$ with gradients that are $L$ -Lipschitz (with respect to $\ell_{2}$ norm). It follows that $|u^{T}(\nabla F(x+\epsilon u)-\nabla F(x))|\leq||u||_{2}\cdot||\nabla F(x+\epsilon u)-\nabla F(x)||_{2}\leq\epsilon L||u||_{2}^{2}\leq\epsilon n^{2}L$ . Define $\eta=n^{3}L$ and suppose that for some $x$ we have that $u=v_{max}(x)$ fails the condition for $\eta$ -locality. That is, $u^{T}\nabla F(x+\epsilon u)<(1-\epsilon n^{3}L)u^{T}\nabla F(x)$ and hence $|u^{T}(\nabla F(x+\epsilon u)-\nabla F(x))|>\epsilon n^{3}L(u^{T}\nabla F(x))$ . Together with the first inequality this yields: $u^{T}\nabla F(x)<\frac{1}{n}$ . Hence if $F$ satisfies (8) we have

[TABLE]

It follows that $F(x)>OPT-\frac{1}{\mu n}$ . Hence if we follow the analysis in the proof of Theorem 9, either we achieve the claimed multiplicative bound, or we reach a point $x$ which is within a small additive constant of opt.

We now apply discretization to our main applications. Note that in some cases, it is enough to have the locality condition on a subdomain of the function. One may show that for non-negative monotone set functions, their multilinear extensions are $n^{2}$ -local on $\{x:x\leq\frac{\vec{1}(n-1)}{n}\}$ . This yields the following result.

Theorem 10.

Let $f$ be a non-negative monotone set function and $F$ be its multilinear extension such that $F$ is $\sigma$ -OSS. Let $P$ denote a polytime separable downward-closed polytope contained in $[0,1]^{n}$ . Assume that $F$ satisfies some $\mu=\mu(\alpha,\sigma)$ bound. Then the output of the discrete version of jump-start continuous greedy algorithm, with $\delta\leq\min\{\frac{1-\alpha}{n^{3}},\frac{1}{(1-\alpha)\mu}\}$ , satisfies

[TABLE]

Proof.

First of all, suppose $t\leq 1-\frac{1}{n}$ . By Taylor’s remainder theorem, for some $\epsilon\in[0,1]$ , we have

[TABLE]

Now note that because $F$ is the multi-linear extension of $f$ , we have

[TABLE]

where $p_{x}(R)=\prod_{j\in R}x_{j}\prod_{j\notin R}(1-x_{j})$ — see [45]. Because $f$ is monotone, the term $f(R+i)-f(R-i)$ is non-negative for any $R$ and $i$ . Also note that because $P\subseteq[0,1]$ is downward-closed we have $v_{max}(x^{t})\geq 0$ .

Since $x^{0}_{i}\leq\alpha$ and $t\leq 1-\frac{1}{n}$ , for any $i$ we have $x^{t}_{i}\leq\alpha+(1-\alpha)(1-\frac{1}{n})=1-\frac{1-\alpha}{n}$ . Let $y^{t}=x^{t}+\epsilon\delta(1-\alpha)v_{max}(x^{t})$ . Because $v_{max}(x^{t})\in[0,1]^{n}$ and $\epsilon,\alpha\in[0,1]$ , we also have $x^{t}_{j}+\delta\geq y^{t}_{j}\geq x^{t}_{j}$ . Therefore

[TABLE]

Now note that because $x_{j}^{t}\leq 1-\frac{1-\alpha}{n}$ , we have $-\frac{\delta}{1-x_{j}^{t}}\geq-\frac{n\delta}{1-\alpha}$ . Therefore we have $\frac{(1-x_{j}^{t}-\delta)}{1-x_{j}^{t}}\geq 1-\frac{n\delta}{1-\alpha}$ . Hence, by Bernoulli’s inequality and choice of $\delta$ , we have

[TABLE]

Hence,

[TABLE]

Therefore, defining $\tilde{OPT}=(1-\frac{n^{2}\delta}{1-\alpha})OPT$ , we have

[TABLE]

where the second inequality follows from assumption. The last inequality holds because $F$ is non-negative and $(1-\frac{n^{2}\delta}{1-\alpha})\leq 1$ . Hence we have

[TABLE]

Therefore by taking $t=1-1/n$ and using induction, we have

[TABLE]

Note that for any $n\geq 2$ , $\frac{1}{\delta}(1-\frac{1}{n})+1\geq\frac{1}{2\delta}$ . Also note that $F(x^{0})\geq 0$ . Therefore

[TABLE]

Note that $(1-\frac{2\delta}{2}(1-\alpha)\mu)^{1/2\delta}\leq\exp(-\frac{1}{2}(1-\alpha)\mu)$ , as $\frac{(1-\alpha)\mu}{2}\leq\frac{1}{2\delta}$ . Therefore we have

[TABLE]

where the first inequality holds because of monotonicity, and the last equality holds because $\delta\leq\frac{1}{(1-\alpha)n^{3}}$ . ∎

We may also show that the discrete greedy algorithm achieves $1$ -step convergence to the claimed bounds if in addition $F$ has an even stronger lower bound on its gradients. (A property effectively saying $F$ is [math]-local.) As we see, this property is satisfied for multilinear extensions of supermodular functions.

Theorem 11.

Let $F:[0,1]^{n}\rightarrow\mathbb{R}$ be a monotone, non-negative $\sigma$ -OSS function and $P$ a polytime separable downward-closed polytope. Assume $F$ satisfies a $\mu=\mu(\alpha,\sigma)$ bound and in addition:

[TABLE]

for all $u,x\in P$ and $\epsilon\in[0,1]$ such that $x+\epsilon u\in P$ . (We assume $\beta\in(0,1]$ and the larger the value of $\beta$ the better). Then $1$ -step discrete continuous greedy computes $x^{1}$ satisfying:

[TABLE]

Proof.

By definition of the algorithm we have $x^{0}=\alpha v^{*}$ and $x^{t+\delta}=x^{t}+\delta\cdot(1-\alpha)\cdot v_{max}(x^{t})$ . Then by Taylor’s Theorem for some $\epsilon\in[0,1]$ we have

[TABLE]

where the first inequality follows from lemma’s assumption, and the second inequality follows by the $\mu$ bound property. For the $1$ -step version we have $\delta=1$ and so

[TABLE]

Thus

[TABLE]

where the last step uses the exponential inequality $1-x\leq e^{-x}$ . ∎

Remark 1.

Note that for $\beta=1$ the above approximation factor is equal to $[1-\exp{(-(1-\alpha)(\frac{\alpha}{\alpha+1})^{2\sigma})}]$ , which matches the approximation obtained via the continuous greedy process, i.e., Theorem 1.

Lemma 4.

A $\sigma$ -OSS function $F\in{\bf C}^{2}$ satisfies (11) with $\beta=1$ if $\nabla^{2}F(x)$ is a copositive matrix for any $x$ . In particular, the multilinear extension of a supermodular function has $\beta=1$ .

Proof.

Using fundamental theorem of calculus, we have $\nabla F(x+p)=\nabla F(x)+\int_{0}^{1}\nabla^{2}F(x+tp)pdt$ . Now the first part of the lemma follows with $p=\epsilon u$ and taking the inner product with $u$ , since $u^{T}\nabla^{2}F(x+t\epsilon u)u\geq 0$ for any $u\geq 0$ . For the second part, let $F$ be the multilinear extension of a supermodular set function $f$ . Then $-F$ is a multilinear extension of the submodular set function $-f$ . Vondrak [45] shows that the Hessian of $-F$ is always non-positive with [math] diagonal. Thus $\nabla^{2}F(x)\geq 0$ and hence copositive. ∎

The version of discrete greedy for multilinear extensions of supermodular functions may appear too good in that it only requires one step. It has two intensive computational ingredients, however. First is to solve an LP to find a starting iterate $x^{0}=\alpha v^{*}$ . The second is to compute the gradient $\nabla F(x^{0})$ , which already requires $O(n^{2})$ work.

Appendix D Appendix: Swap Rounding for multilinear quadratics

In this section, we analyze a modified version of the swap rounding algorithm (Algorithm 2) and we show that it finds an integral solution which is an $O(1+\frac{\sigma}{r})$ -approximation of the initial fractional solution.

First we define the following notation. $A(S)=\frac{1}{2}\sum_{i,j\in S}A(i,j)$ and $A(S,S^{\prime})=\sum_{i\in S}\sum_{j\in S^{\prime}}A(i,j)$ and $b(S)=\sum_{i\in S}b(i)$ . With an abuse of notation, we show $A(\{i\},S)$ with $A(i,S)$ . The following result provides a decomposition of the multilinear extension of a quadratic function based on the convex decomposition of a point to the bases of the matroid.

Lemma 5.

Let $f(S)=\sum_{i\in S}b(i)+\frac{1}{2}\sum_{i,j\in S}A(i,j)$ where $b:[n]\rightarrow\mathbb{R}_{\geq 0}$ and $A:[n]\times[n]\rightarrow\mathbb{R}_{\geq 0}$ is a symmetric matrix with $A(i,i)=0$ for all $i\in[n]$ . Then the multilinear extension of $f$ is $F(x)=\frac{1}{2}x^{T}Ax+x^{T}b$ . Moreover, if $x=\sum_{k=1}^{p}\lambda_{k}\mathbbm{1}_{I_{k}}$ for some scalars $\lambda_{k}$ ’s and subsets $I_{k}\subseteq[n]$ , then

[TABLE]

Proof.

For the first part of the lemma note that

[TABLE]

To see the second part, observe that

[TABLE]

and

[TABLE]

∎

Lemma 6.

Let $\mathcal{M}=([n],\mathcal{I})$ be a matroid and $P$ be its corresponding base polytope. Let $F(z)=\frac{1}{2}z^{T}Az+z^{T}b$ where $A,b\geq 0$ and $A$ is a symmetric matrix such that its diagonal is zero. Let $f(S)=F(\mathbbm{1}_{S})$ for any $S\subseteq[n]$ . Let $x=\sum_{i=1}^{p}\lambda_{i}\mathbbm{1}_{I_{i}}\in P$ where $I_{i}$ ’s are bases of the matroid, $\sum_{i=1}^{p}\lambda_{i}=1$ , and $\lambda_{i}\geq 0$ , for $i=1,\ldots,p$ . Let $(I^{\prime},M)$ be the output of MergeBases (defined in Algorithm 2) on $(I_{1},\ldots,I_{p})$ and $(\lambda_{1},\ldots,\lambda_{p})$ . Let $y=(\lambda_{1}+\lambda_{2})\mathbbm{1}_{I^{\prime}}+\sum_{i=3}^{p}\lambda_{i}\mathbbm{1}_{I_{i}}$ . Then $F(x)\leq F(y)+\lambda_{1}\lambda_{2}\sum_{(i,j)\in M}A(i,j)$ .

Proof.

Let $I_{1}^{0}=I_{1}$ and $I_{2}^{0}=I_{2}$ (the original inputs of the function). Let $I_{1}^{m}$ and $I_{2}^{m}$ be the resulting $I_{1}$ and $I_{2}$ after the $m$ -th iteration of the while loop. Let $x_{m}=\lambda_{1}\mathbbm{1}_{I_{1}^{m}}+\lambda_{2}\mathbbm{1}_{I_{2}^{m}}+\sum_{k=3}^{p}\lambda_{k}\mathbbm{1}_{I_{k}}$ . Let $i_{m},j_{m}$ be the elements we pick at the $m$ -th iteration of the loop. We show that $F(x_{m-1})\leq F(x_{m})+\lambda_{1}\lambda_{2}A(i_{m},j_{m})$ and this yields the desired result using a simple recursion argument. Without loss of generality, we assume

[TABLE]

We have

[TABLE]

The inequality holds because of (D), and the first and the last equalities follow from Lemma 6. The second to the last equality uses that $I_{1}^{m}=I_{1}^{m-1}$ and $I_{2}^{m}=I_{2}^{m-1}-j_{m}+i_{m}$ . ∎

Theorem 12.

Let $\mathcal{M}([n],\mathcal{I})$ be a matroid of rank $r$ and $P$ be its corresponding base polytope. Let $F(z)=\frac{1}{2}z^{T}Az+z^{T}b$ where $A,b\geq 0$ and $A$ is a symmetric matrix with zero diagonal that satisfies the $\sigma$ -semi-metric inequality, i.e., $A(i,j)\leq\sigma(A(i,k)+A(j,k))$ for all $i,j,k\in[n]$ . Let $f(S)=F(\mathbbm{1}_{S})$ for any $S\subseteq[n]$ . Let $x\in P$ and $S$ be the output of the modified swap rounding (Algorithm 2) on $x$ . Then $F(x)\leq O(1+\frac{\sigma}{r})f(S)$ .

Proof.

Let $x=\sum_{i=1}^{p}\lambda_{i}\mathbbm{1}_{I_{i}}\in P$ where $I_{i}$ ’s are bases of the matroid, $\sum_{i=1}^{p}\lambda_{i}=1$ , and $\lambda_{i}\geq 0$ , for $i=1,\ldots,p$ . Let $S$ be the output of the swap rounding (Algorithm 2) if it starts from $(I_{1},\ldots,I_{p})$ and $(\lambda_{1},\ldots,\lambda_{p})$ . Let $x_{k}$ denote the vector corresponding to $\bm{I}_{k}=(I^{\prime}_{k},I_{k+1},\ldots,I_{p})$ and $\bm{\lambda}_{k}=(\lambda^{\prime}_{k},\lambda_{k+1},\ldots,\lambda_{p})$ , i.e. $x_{k}=\lambda^{\prime}_{k}\mathbbm{1}_{I^{\prime}_{k}}+\sum_{i=k+1}^{p}\lambda_{i}\mathbbm{1}_{I_{i}}$ . By Lemma 6, for $k=1,\ldots,n-1$ , we have

[TABLE]

where $t=\operatorname*{arg\,max}_{k=1,\ldots,p-1}\{\sum_{(i,j)\in M_{k}}A(i,j)\}$ . Therefore

[TABLE]

where the last inequality holds since $2\sum_{k=1}^{p-1}\sum_{m=1}^{k}\lambda_{m}\lambda_{k+1}\leq(\sum_{k=1}^{p}\lambda_{k})^{2}=1$ . Now, we bound the term $\sum_{(i,j)\in M_{t}}A(i,j)$ . By definition of $M_{t}$ , note that $M_{t}\subseteq I^{\prime}_{t}\times I_{t+1}$ . Using this and Lemma 5 it follows that

[TABLE]

By Lemma 6 and the $\sigma$ -semi-metric assumption, we also know that

[TABLE]

Note that none of the edges of $M^{*}$ is present in the right hand side summation. Therefore

[TABLE]

where the second inequality follows from Lemma 5 and the last inequality holds because of Lemma 6. Combining (16), (17), and (D), we get

[TABLE]

Hence, by (D) and (19), we have

[TABLE]

and this yields the result. ∎

Appendix E Appendix: Hardness of Approximation for $\sigma$ -Semi-Metric Diversity

In this section, we provide a hardness result for approximate maximization of $\sigma$ -semi-metric diversity functions defined on a semi-metric distance. Our results are based on inapproximability results for finding densest subgraphs.

Given a graph $G$ and integer $k$ , the densest $k$ -subgraph problem aims to find an induced subgraph of size $k$ with the maximum number of edges. Let $R$ be a subset of vertices of $G$ and $E(R)$ be the number of edges in the induced subgraph of $R$ . The density of $R$ is defined as $\rho(R)=E(R)/{|R|\choose 2}\leq 1$ . A recent breakthrough [38] shows that, assuming the exponential time hypothesis (ETH), there is no subpolynomial approximation algorithm for densest subgraph. More precisely, there is no polytime algorithm which can distinguish between two cases: (i) an instance $G$ which contains a $k$ -clique and (ii) an instance where the density of every $k$ -subset $S$ satisfies $\rho(S)\leq n^{-1/(\log\log n)^{c}}$ , where $c$ is a universal constant independent of $n$ . In the following, we let $\theta:=n^{1/(\log\log n)^{c}}$ . Existence of constant-factor approximations had previously been ruled out under the unique games conjecture with small set expansion [41].

Theorem 7.

Assuming ETH: (1) There is no polytime $4/\theta$ -approximation algorithm for maximizing $\sigma$ -semi-metric functions subject to a cardinality constraint, and (2) for any fixed $\sigma\geq 1$ and $\epsilon>0$ , there is no polytime algorithm which approximates the maximum of a $\sigma$ -semi-metric function subject to a cardinality constraint within a factor of $2\sigma-\epsilon$ .

Proof.

For $\sigma\geq 1$ , we can reduce the densest $k$ -subgraph problem to $\sigma$ -semi-metric function maximization in the following way. Consider an instance of densest $k$ -subgraph on graph $G$ with vertex set $[n]$ . Create a distance function $A:[n]\times[n]\rightarrow\mathbb{R}$ . If there is an edge between $i,j\in[n]$ in $G$ , set $A(i,j)=2\sigma>1$ ; otherwise set $A(i,j)=1$ . It is easy to see that this distance function is $\sigma$ -semi-metric. Let $f(R)=\sum_{\{i,j\}\subseteq R}A(i,j)$ . If $|R|=k$ , we have

[TABLE]

We know ${k\choose 2}\geq E(R)$ . Therefore

[TABLE]

and dividing both sides by $2\sigma{k\choose 2}$ we get

[TABLE]

It is also easy to see that

[TABLE]

Suppose there is a $\frac{4}{\theta}$ -approximation algorithm for maximizing $\sigma$ -semi-metric functions. Let its output on $G$ be $S$ and choose

[TABLE]

We have

[TABLE]

We can choose our $\sigma=\sigma(n)\geq\theta$ so that $\frac{\theta}{8\sigma}\leq\frac{1}{2}$ . Hence $\rho(\texttt{OPT})\leq(\theta/4)\rho(S)+\frac{1}{2}$ . If $G$ is a graph in which the density of every subset of vertices of size $k$ is at most $1/\theta$ , then clearly $\rho(S)\leq 1/\theta$ . If $G$ is a graph that contains a clique of size $k$ , then $1=\rho(\texttt{OPT})\leq(\theta/4)\rho(S)+\frac{1}{2}$ , and so $\rho(S)\geq\frac{2}{\theta}$ . This means that our $1/\theta$ -approximation algorithm can distinguish between these two graphs, contradicting the implications from [38].

For (2), consider a given $\sigma\geq 1,\epsilon>0$ and suppose there is a $(2\sigma-\epsilon)$ -factor approximate algorithm for maximizing a $\sigma$ -semi-metric function. Denote its output on $G$ by $S$ , and let $OPT$ be defined as above. We then have

[TABLE]

Set $\delta=(\frac{1}{2\sigma-\epsilon}-\frac{1}{2\sigma})/2=\frac{\epsilon}{4\sigma(2\sigma-\epsilon)}$ , and note that $\delta>0$ is a constant. If $G$ is a graph in which the density of every subset of vertices of size $k$ is at most $1/n^{\frac{1}{\log\log n)^{c}}}$ , then clearly $\rho(S)\leq 1/n^{\frac{1}{\log\log n)^{c}}}$ and this is at most $\delta$ for $n$ sufficiently large. If $G$ is a graph that contains a clique of size $k$ , then $1=\rho(\texttt{OPT})\leq(2\sigma-\epsilon)\rho(S)+\frac{2\sigma-\epsilon}{2\sigma}$ which means $\rho(S)\geq\frac{1}{2\sigma-\epsilon}-\frac{1}{2\sigma}=2\delta$ . This means that our $(2\sigma-\epsilon)$ -factor approximate algorithm can distinguish between these two graphs which again contradicts the implications of [38]. ∎

Theorem 8.

Let $k,t\in\mathbb{N}$ with $1\leq t\leq k$ . There exists a $\sigma$ -semi-metric with multilinear extension $F$ , and a matroid $\mathcal{M}=([2k],\mathcal{I})$ with rank $r=k+t-1$ and minimum circuit size $c=2t$ , where the integrality gap of $F(x)$ over the matroid polytope $P_{\mathcal{M}}$ is $\Omega(\min\{\frac{r}{c-2},\frac{\sigma}{r}\})$ .

Proof.

Let $S_{i}=\{2i-1,2i\}$ for $1\leq i\leq k$ , and $\mathcal{S}=\{S_{1},S_{2},\ldots,S_{k}\}$ . We define a matroid $\mathcal{M}=([2k],\mathcal{I})$ in terms of its circuits as follows. A set $C$ is a circuit of $\mathcal{M}$ if and only if $C$ is the union of any $t$ sets $S_{i}$ . It is then clear that the minimum size $c$ of a circuit is $2t$ , and the rank $r$ of the matroid is $k+t-1$ . For example, $\mathcal{M}$ could be the graphic matroid corresponding to the graph in Figure 2. Circuits here correspond to cycles of size $4$ , and the dashed lines show the non-zero coefficients of $F$ .

Let $F(x)=\sum_{\{u,v\}\in\mathcal{S}}x_{u}x_{v}+\sum_{\{u,v\}\notin\mathcal{S}}\frac{1}{\sigma}x_{u}x_{v}$ . It is straightforward to see that $F$ is the multilinear extension of a $\sigma$ -semi-metric diveristy function induced by a complete graph which has weight $1$ on edges from $\mathcal{S}$ and weight $1/\sigma$ otherwise.

By definition of $\mathcal{M}$ and $F$ , it is clear that any integral solution $x_{I}\in P_{\mathcal{M}}$ maximizing $F$ will pick $t-1$ pairs from $\mathcal{S}$ and then singletons from other pairs. Therefore

[TABLE]

On the other hand, $x_{0}:=\frac{k+t-1}{2k}\mathbbm{1}_{[2k]}\in P_{\mathcal{M}}$ and

[TABLE]

Using that $r=k+t-1$ and $k=r-\frac{c}{2}+1$ we have

[TABLE]

where the last inequality follows since $c\geq 2$ . Hence, $F(x_{0})\geq\frac{r}{4}(1+\frac{2(k-1)}{\sigma}).$ It follows that the integrality gap is at least

[TABLE]

∎

Appendix F Appendix: One-Sided Smoothness versus Lipschitz Smoothness

Lipschitz smoothness is an important, widely-used property in convex optimization and machine learning. One-sided $\sigma$ -smoothness is different from Lipschitz smoothness (and other smoothness notions based on Holder’s or uniform continuity) and we believe it may also have applications to these areas.

A differentiable function is Lipschitz smooth if its gradient is Lipschitz continuous. In other words, $f$ is Lipschitz smooth if there exists $L\geq 0$ such that for any $x$ and $y$ , $||\nabla f(x)-\nabla f(y)||_{2}\leq L||x-y||_{2}$ or equivalently for twice differentiable functions, $u^{T}\nabla^{2}f(x)u\leq L||u||_{2}^{2}.$ We then call $f$ $L$ -Lipschitz smooth. One could define the one-sided version of this smoothness if the above inequality holds for any $x\leq y$ (second definition/inequality holds for any $u\geq\vec{0}$ ). With this definition, it is easy to see that submodular functions are one-sided [math]-Lipschitz smooth. On the other hand one-sided $\sigma$ -smoothness is not equivalent to one-sided $L$ -Lipschitz smoothness. To see an important difference, consider $g=cf$ function where $c$ is a constant and $f$ is one-sided smooth. We have $\nabla g(x)=c\nabla f(x)$ . Thus if $f$ is one-sided $L$ -Lipschitz smooth we may only assert that $g$ is one-sided $cL$ -Lipschitz smooth. In particular, Lipschitz smoothness is not closed under multiplication. On the other hand, the one-sided $\sigma$ -smooth functions form a cone. Intuitively, the reason is that in $\sigma$ -smooth functions, the ratio of the gradients is bounded (as shown in Lemma 1) unlike Lipschitz smoothness where the difference of the gradients is bounded.

Appendix G Appendix: Other Applications

G.1 Appendix: The Diversified Procurement Problem

Consider a problem whereby an organization decides how to outsource the building or servicing of a system to a collection of $n$ competing vendors. The outcome is an allocation of work across the vendors. An allocation is represented by a vector $x\in P\subseteq\mathbb{R}^{n}$ ; we focus on the case where $P\subseteq[0,1]^{n}$ . Possibly $P$ is just $P=\{x:||x||_{1}\leq k\}$ but may also incorporate structural constraints imposed by the system or to enforce a resilience solution (e.g., avoid allocations where one vendor becomes too big to fail). Given bids $b_{i}$ for each $i$ , the payoff to the organization is $b^{T}x$ . A different type of consideration for the procuring organization is to build diversity into the work-plan which results. We consider two sources for lack of diversity. First, there may be collusions and these are to be subdued. The organization can define a matrix $[A_{ij}]\geq 0$ which estimates pairwise collusions. A solution which lessens the value of $x^{T}Ax$ is more desired. Second, the system may be serving a collection of $m$ stakeholder communities. Different vendors may be more desirable than others to distinct communities. Again, the procuring organization can model this by defining vectors $g_{i}\in\mathbb{R}^{m}$ , where $g_{i}(j)\in\mathbb{R}$ represents the level of support (positive or negative) it receives from community $j$ . The overall measure of quality seeks a solution which promotes representation across more communities (vectors with $g_{i}^{T}g_{i^{\prime}}<0$ are pointing in different directions; hence good). We propose the following model to address this multi-criteria objective:

[TABLE]

where $G$ is the $m\times n$ matrix whose columns are $g_{i}$ . Hence $F$ consists of a revenue part $b^{T}x$ and a penalty part for lack of diversity.

It is easily seen that $A+G^{T}G$ is copositive and hence $F(x)$ is [math]-OSS — see Section 1.1.555This also leads to examples of [math]-OSS functions whose Hessians have positive off-diagonals. For instance, by taking $A=0$ and select vectors $g_{i}$ which are pairwise oblique (i.e., $g_{i}\cdot g_{j}<0$ ). Hence the jump-start continuous greedy process of Section 4.3 can be applied if the model has been defined so that $\nabla F(x)\geq 0$ (since $F$ is normalized this ensures non-negativity). Checking this gradient condition is easy. Moreover, it is useful in the modelling phase for exploring the trade-offs between the vector $b$ of bids and the community representation objectives which are determined by $G$ .

Note that Theorem 2 implies that the continuous greedy process produces a solution which is within a factor $1-1/e$ of the fractional optimum of $\max\{F(x):x\in P\}$ . We may also create a (weakly) polytime $1-1/e$ approximation as follows.

We assume that the model has been constructed so that the bids dominate the diversity penalties. Concretely we assume that $\nabla F(x)=Cx+b>\vec{\frac{1}{n}}$ . We also let $U$ be an upper bound on the entries of $C:=A+G^{T}G$ , i.e., $\max|C_{ij}|$ . We now examine for which values $\eta$ is the function $F(x)$ $\eta$ -local — see paragraph before Theorem 9. That is, we want:

[TABLE]

for any $x,u\geq 0$ , $\epsilon\in[0,1]$ . This holds if we have:

[TABLE]

By re-arranging this holds as long as

[TABLE]

We now use the fact that $|u^{T}Cu|\leq\lambda_{|max|}||u||^{2}_{2}$ , where $\lambda_{|max|}$ is the maximum value of $|\lambda|$ for an eigenvalue $\lambda$ of $C$ . By the Gershgorin Circle Theorem [29] $\lambda_{|max|}$ is at most $|C_{ii}|+\sum_{j\neq i}|C_{ij}|\leq nU$ . Hence $|u^{T}Cu|\leq(nU)||u||^{2}_{2}$ which is at most $n^{2}U||u||_{1}$ if $||u||_{\infty}\leq 1$ . By our gradient assumption, the right hand side of (21) is then at least $\frac{\eta}{n}||u||_{1}$ . Note that we only need to establish $\eta$ -locality for vectors $u=v_{Max}$ selected by the greedy process. Since these vectors lie in $P$ which is in the unit hypercube, we have the desired inequality. Hence we may choose $\eta=n^{3}U$ and a discretization follows from Theorem 9.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. Diversity maximization under matroid constraints. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013 , pages 32–40, 2013.
2[2] Francis Bach. Submodular functions: from discrete to continuous domains. Mathematical Programming , 175(1-2):419–459, 2019.
3[3] Rafael da Ponte Barbosa, Alina Ene, Huy L Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS) , pages 645–654. Ieee, 2016.
4[4] Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vijayaraghavan, and Yuan Zhou. Polynomial integrality gaps for strong sdp relaxations of densest k-subgraph. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete algorithms , pages 388–405. SIAM, 2012.
5[5] Aditya Bhaskara, Mehrdad Ghadiri, Vahab S. Mirrokni, and Ola Svensson. Linear relaxations for finding diverse elements in metric spaces. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain , pages 4098–4106, 2016.
6[6] An Bian, Kfir Levy, Andreas Krause, and Joachim M Buhmann. Continuous dr-submodular maximization: Structure and algorithms. In Advances in Neural Information Processing Systems , pages 486–496, 2017.
7[7] An Bian, Baharan Mirzasoleiman, Joachim M Buhmann, and Andreas Krause. Guaranteed non-convex optimization: Submodular maximization over continuous domains. Proceedings of Machine Learning Research , 54:111–120, 2017.
8[8] Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages 498–507. JMLR. org, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Beyond Submodular Maximization via One-Sided Smoothness

Abstract

1 Introduction

1.1 The Zero One-Sided Smooth Class

2 Our Results

3 Related Work

4 Fractional Approximation.

4.1 Notations

4.2 A Key Property of One-Sided Smoothness

Lemma 1**.**

Proof.

4.3 Continuous Greedy and One-Sided σ\sigmaσ-Smoothness

Theorem 1**.**

Proof.

Theorem 2**.**

5 Rounding

Theorem 3** (Quadratic Integrality Gap over Matroids).**

Theorem 4**.**

Theorem 5**.**

Theorem 6**.**

Proof.

6 Hardness

Theorem 7**.**

Corollary 1**.**

Theorem 8**.**

7 Conclusion

8 Acknowledgements

Appendix A Appendix: Semi-metric diversity and OSS

Definition 1**.**

Proposition 1**.**

Proof.

Proposition 2**.**

Proof.

Lemma 2**.**

Proof.

Proposition 3**.**

Proof.

Appendix B Appendix: Jump-Start Continuous Greedy

Proposition 4**.**

Proof.

Theorem 2.

Proof.

Appendix C Appendix: Discretization of the Continuous Greedy

Lemma 3**.**

Theorem 9**.**

Proof.

Theorem 10**.**

Proof.

Theorem 11**.**

Proof.

Remark 1**.**

Lemma 4**.**

Proof.

Appendix D Appendix: Swap Rounding for multilinear quadratics

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

Theorem 12**.**

Proof.

Appendix E Appendix: Hardness of Approximation for σ\sigmaσ-Semi-Metric Diversity

Theorem 7.

Proof.

Theorem 8.

Proof.

Appendix F Appendix: One-Sided Smoothness versus Lipschitz Smoothness

Appendix G Appendix: Other Applications

G.1 Appendix: The Diversified Procurement Problem

Lemma 1.

4.3 Continuous Greedy and One-Sided $\sigma$ -Smoothness

Theorem 1.

Theorem 2.

Theorem 3 (Quadratic Integrality Gap over Matroids).

Theorem 4.

Theorem 5.

Theorem 6.

Theorem 7.

Corollary 1.

Theorem 8.

Definition 1.

Proposition 1.

Proposition 2.

Lemma 2.

Proposition 3.

Proposition 4.

Lemma 3.

Theorem 9.

Theorem 10.

Theorem 11.

Remark 1.

Lemma 4.

Lemma 5.

Lemma 6.

Theorem 12.

Appendix E Appendix: Hardness of Approximation for $\sigma$ -Semi-Metric Diversity