Beyond Submodular Maximization via One-Sided Smoothness
Mehrdad Ghadiri, Richard Santiago, Bruce Shepherd

TL;DR
This paper extends the multilinear framework for submodular maximization to a broader class of functions, introducing a new parameter called one-sided smoothness, and provides improved approximation algorithms for diversity maximization problems under matroid constraints.
Contribution
It introduces the concept of one-sided smoothness for functions, extending the multilinear framework, and develops new approximation algorithms with better bounds for diversity maximization.
Findings
Achieves an rac{1}{\sigma} approximation for monotone, normalized one-sided rac{1}{\sigma}-smooth functions.
Provides an rac{1}{\sigma^{3/2}} approximation for rac{1}{\sigma}-semi-metric diversity functions under matroid constraints.
Develops a polynomial-time algorithm for multilinear one-sided rac{1}{\sigma}-smooth functions.
Abstract
The multilinear framework has achieved the breakthrough approximation for maximizing a monotone submodular function subject to a matroid constraint. This framework has a continuous optimization part and a rounding part. We extend both parts to a wider array of problems. In particular, we make a conceptual contribution by identifying a family of parameterized functions. As a running example we focus on solving diversity problems , where is a matroid. These diversity functions have as a measure of dissimilarity of , and has -diagonal. The multilinear framework cannot be directly applied to the multilinear extension of such functions. We introduce a new parameter for functions which measures the approximability of the associated problem , for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Beyond Submodular Maximization via One-Sided Smoothness
Mehrdad [email protected], Georgia Institute of Technology, Atlanta, GA, USA.
Richard [email protected], ETH Zurich, Switzerland.
Bruce [email protected], University of British Columbia, Vancouver, Canada.
Abstract
The multilinear framework was developed to achieve the breakthrough approximation for maximizing a monotone submodular function subject to a matroid constraint, which includes the submodular welfare problem as special case. This framework has a continuous optimization part (solving the multilinear extension of a submodular set function) and a rounding part (rounding a fractional solution to an integral one). We extend both parts so that the resulting generalized framework may be used on a wider array of problems. In particular, we make a conceptual contribution by identifying a family of parameterized functions and their applications. As a running example we focus on solving diversity problems , where is matroid. These diversity functions have as a measure of dissimilarity of , and has [math]-diagonal. This family of problems ranges from intractable problems such as densest -subgraph, to -approximable metric diversity problems. The multilinear extension of such diversity functions satisfies and hence the original multilinear framework (which assumes non-positive Hessians) does not directly apply. Instead we introduce a new parameter for functions which measures the approximability of the associated problem , for solvable downwards-closed polytopes . A function is called one-sided -smooth if for all , . For this class includes previously studied classes such as continuous DR-submodular functions, and much more. For the multlinear extension of a diversity function, we show that it is one-sided -smooth whenever forms a -semi-metric.
We give an -approximation for the continuous maximization problem of monotone, normalized one-sided -smooth with an additional property: non-positive third order partial derivatives. Since the multilinear extension of a diversity function has this additional property we can apply the extended multilinear framework to this family of discrete problems. This requires new matroid rounding techniques for quadratic objectives. The result is a -approximation for maximizing a -semi-metric diversity function subject to matroid constraint. This improves upon the previous best bound of and we give evidence that it may be tight. For general one-sided smooth functions, we show the continuous process gives an -approximation, independent of . In this setting, by discretizing, we present a concrete poly-time algorithm for multilinear functions that satisfy the one-sided -smoothness condition. We also describe a discretization for one-sided smooth functions with -Lipschitz gradients.
1 Introduction
In a breakthrough result, an optimal approximation was given for monotone submodular maximization subject to a matroid constraint [13, 45]. This resolved a long standing gap between the best known approximation [27] and lower bound [22]. It also provides a tight approximation for the submodular welfare problem [23]. A key insight was to use a continuous relaxation based on the multilinear extension (ME) of a set function . For , is defined as , where is a random set with each being selected independently with probability . In particular, for , . Thus a valid multilinear relaxation for a discrete problem is obtained: , where .
This framework has inspired a successful stream of research including on non-monotone submodular functions [12] and new ‘contention resolution’ rounding techniques for general polytopes [18]. In this work we make a conceptual contribution by identifying a family of parameterized set functions where an extension to the multilinear framework can be brought to bear. We also give several applications for this generalized framework.
Using this framework to solve a discrete problem requires two essential ingredients. First, algorithmic tools to find a good solution for the multilinear relaxation. Second, to be able to convert a solution into a set with . As multilinear extensions are neither concave nor convex, it is not a priori clear that the fractional problem itself would be tractable. For monotone submodular functions, however, a gradient-based technique—called continuous greedy—is shown to provide a approximation [45]. This analysis relies on the fact that MEs of submodular functions have non-positive second derivatives. Functions with this property are called continuous DR-submodular [7] (cf. [43]). The rounding step for matroids relies on a different property of multilinear extensions. Namely, if is submodular, then is convex in any direction , where denotes the characteristic vector of . This allows a lossless conversion to a discrete solution, for instance, by using pipage rounding. The combination of the fractional algorithm and rounding provides the approximation.
In this paper, we develop a wider scope for the multilinear framework and give evidence of its use in other applications. One motivating example is diversity maximization [36, 35, 46] which has applications in machine learning [49, 30], document aggregation [1], web search [40], recommender systems [47, 14], and many more. One widely used model is , where for all . We refer to as a diversity function if is symmetric and has [math]-diagonal. We think of as measuring dissimilarity between items . This family of (supermodular) maximization problems ranges from challenging examples such as -densest subgraph, with best known approximation [4, 38], to metric diversity ( forms a metric) which is -approximable [33, 11]. Since the multilinear extension of a diversity function has Hessian which is non-negative (as opposed to non-positive), the standard -approximation from continuous greedy does not directly apply. One of our main messages is that the metric property in diversity maximization is intrinsic to the tractability of the multilinear relaxation.
To describe our extended multilinear framework we first discuss the fractional problem and later discuss rounding. We introduce a parameterized family of monotone, non-negative functions and then show that the parameter governs the approximability of the problem , for downwards-closed polytopes . To achieve this we cannot directly rely on a crucial fact used in the analysis of the continuous greedy process: that the rate of change of at a point is at least the current deficit, defined as OPT. For MEs of submodular functions, this follows from being concave in non-negative directions, which in turn relies on ’s second derivatives being non-positive. Instead, we define a family of functions whose growth in non-negative directions is constrained by a parameter . A function in is called one-sided -smooth (or -OSS for short) if it satisfies
[TABLE]
for all , .
The class of [math]-OSS functions already contains interesting and familiar functions. This includes the continuous DR-submodular functions, as their Hessians are non-positive [7, 6]; the DR-submodular form a superset of the functions originally considered for continuous greedy [45]. The [math]-OSS functions contain much more however, as we discuss later in Section 1.1. For all of these functions, the continuous greedy process returns a solution within of the optimum; in some cases, converting this into a concrete polytime algorithm requires additional assumptions.
For larger values of , one example of -smooth functions is the class of -semi-metric diversity functions. Namely, the parameter corresponds to the matrix being a -semi-metric. This means that for all , see Proposition 3 in Appendix A. This captures diversity functions addressed in the literature, such as metric diversity [9] (), and negative-type distances [16, 15] or Jensen-Shannon divergence which has been used to measure dissimilarity of probability measures (both have smoothness ), see Appendix A.
Our main contribution to the fractional problem is to show that one-sided smoothness of a monotone, non-negative function governs the approximability of , for downwards closed polytopes . This is reminiscent of how Lipschitz smoothness bounds convergence rates in convex optimization — see Appendix F for a discussion about the difference of Lipschitz smoothness and OSS. If additionally has non-positive third order partials, then we show that continuous greedy can be adapted to become a -approximation, and we show this is tight. This class includes the discussed MEs for diversity maximization. We can combine this with new rounding techniques to obtain unified results for maximizing diversity functions over matroids. Unlike for submodular functions, this requires the best-of-two rounding methods. One is inspired by swap rounding, previously applied to the submodular case. The other extends the approximate integer decomposition framework [17] to handle the “pairwise terms” in diversity functions.
For general -smooth functions, without any third order assumption, we can obtain an approximation (independent of ) for the continuous greedy process. We can no longer use the 2nd order Taylor Polynomial since we do not have non-positivity of the third order error term. Instead we work with the 1st Order Taylor expansion but this requires a new upper bound on , the directional derivative, in a neighbhourhood of . In the fully general setting we need a (strong) lower bound on to make a concrete algorithm. However, for multilinear -OSS functions a polytime algorithm is shown independent of any additional assumptions. We also consider discretization for one-sided smooth functions with Lipschitz s gradients, and for a class of [math]-smooth functions which are not continuous DR submodular Section G.1.
1.1 The Zero One-Sided Smooth Class
The class of [math]-OSS functions is interesting in its own right. For monotone, non-negative members of this family our results show that the continuous greedy process yields a approximation for the fractional problem. Obtaining a polytime algorithm (discretization) is not immediate but we can establish natural conditions on cases where this can be achieved. The general [math]-OSS family forms a very broad class of functions. For instance, it contains every concave function (even though our results are only tailored for the monotone, non-negative functions in this class). This means it also contains the continuous DR-submodular functions (Hessians are non-positive). This containment is proper since there are [math]-OSS functions with positive off-diagonal entries in their Hessian. It is interesting to compare with the related family of continuous submodular functions that has been developed in the context of minimization [2]. Continuous submodular functions are defined as having Hessians with non-positive off-diagonal entries, but they may have positive diagonal entries. In contrast, [math]-OSS must have non-positive diagonals but may have positive off-diagonals (cf. Appendix G.1) - see Figure 1.
The (general) [math]-OSS family can be defined as the functions for which is copositive for every 444A matrix is copositive if for every [20].. While recognition of copositive matrices is NP-hard [39], we propose a strategic procurement problem which is modelled as maximizing a quadratic functions where is a copositive matrix defined by the user. Note that this family of objectives are a generalization of concave quadratics. We refer to the resulting (fractional) maximization problem as diversified procurement discussed in Appendix G.1.
2 Our Results
Our results are of three types: 1) fractional approximations, 2) rounding, and 3) hardness results. These are presented in Sections 4, 5, and 6 respectively. In all of our results, we assume that the function is monotone and non-negative.
Fractional approximation. Our main result in this part (Theorem 1) shows that a modified version of the continuous greedy process gives a -approximation for maximizing a non-negative, monotone, -OSS function subject to a downwards-closed polytope, where is an arbitrary number in . We remark that for , our results recover the -approximation [45, 13] for maximizing the multilinear extension of a submodular function, by setting . For , our approximation is better than . For fixed this gives a constant-factor approximation independent of . At present, we do not know the correct dependence on . However, the dependence improves to linear with an additional assumption that third-order partials are non-positive. More precisely, we obtain a -approximation; see Theorem 2. As mentioned in Section 1 this gives a approximation which is in fact tight within a constant factor (cf. Corollary 1 discussed below). One example of such functions are multilinear extensions of semi-metric diversity functions, i.e., whose Hessian is a -semi-metric (discussed further in the rounding part).
The ‘algorithm’ described in the previous paragraph is a continuous-time process, and it is not immediately obvious that it can be implemented as a discrete algorithm. Some readers may wish to take it on faith that this is possible and skip ahead to the rounding results. There are actually some subtleties involved which require two distinct approaches. One of our methods works for multilinear functions, while the other works for general OSS functions but needs an additional parameter that governs the growth of the first order derivatives from below. A fuller discussion is in Appendix C.
Rounding. In this part, we consider maximizing set functions of the form
[TABLE]
where is a symmetric matrix with [math]-diagonal. If , then these are the previously discussed diversity functions, but more generally we refer to these as discrete quadratics (aka second-order modular [34]) as their extensions are quadratic functions . Since their third derivatives are obviously [math], they ‘qualify’ for the -approximation from the preceding section. Hence we have a -approximation for maximizing over a matroid polytope when is -OSS (i.e., is -semi-metric). In order to solve the discrete problem , we need to transform this fractional solution to a discrete one.
We present two different rounding procedures which combined lead to a rounding gap of , where denotes the rank of the matroid and the size of a smallest circuit. Surprisingly we show this is tight (see hardness part). Moreover, this yields an rounding gap independent of and (Theorem 3). Combining the modified continuous greedy algorithm with our rounding result, there is an -approximation for maximizing -semi-metric diversity functions subject to a matroid constraint (Theorem 4). This improves the best known bound [48]. In addition, we note that is a pessimistic bound in general. For instance, for uniform matroids we have , which leads to an rounding gap and hence an improved ; as discussed below this is actually tight.
This rounding gap implies that for a cardinality constraint, the approximation bound of the discrete problem is asymptotically the same as the bound for the continuous problem. Thus the continuous problem of maximizing a general multilinear quadratic function over the simplex , is as hard as solving the densest -subgraph problem (see Corollary 1). This is similar to the situation for continuous maximization of MEs of submodular functions. Such continuous hardness problems have received less attention, as remarked by De Klerk [19]: “approximation algorithms have been studied extensively for combinatorial optimization problems, but have not received the same attention for NP-hard continuous optimization problems.” We close this section by discussing our hardness results for the discrete problems.
Hardness. In this part, we show that the hardness of approximation is also governed by the smoothness parameter of the function. More specifically, in Theorem 7 we show that assuming the planted clique conjecture, for a constant it is hard to approximate the maximum of a -semi-metric diversity function subject to a cardinality constraint within a factor better than . We also show that for a super constant , it is hard to find any constant factor approximation.
In Theorem 8 we give a lower bound of for the rounding gap of a -semi-metric diversity function over a matroid polytope. This shows that our rounding methods are essentially tight. In particular, each step of our algorithm for maximizing diversity functions (i.e., maximizing the continuous function and rounding) is tight. This leads us to speculate that the -approximation (Theorem 4) is asymptotically tight.
3 Related Work
We first discuss work related to solving the continuous problem in the multilinear framework. Other adaptations of the continuous greedy algorithm have been developed for applications to non-monotone submodular maximization [24, 21] and distributed maximization [3]. Another avenue aimed to generalize the class of functions originally considered [45]. For instance Bach [2] develops minimization algorithms for the family of continuous submodular functions [37] defined on compact product subsets of . A function is submodular if the off-diagonal entries of its Hessian are non-positive. This class is an extension of lattice submodular functions [44, 28] (a lattice is a poset closed under meet and join operations and hence these generalize submodular set functions). DR-submodularity is a restricted form of lattice submodular functions introduced for maximization [42, 43]. These generalize to continuous DR-submodular functions in for which all entries of the Hessian are non-positive [7]. The continuous greedy algorithm has also been studied for maximization of these continuous functions. Discretization requires an additional bound on Lipschitz smoothness and then a -approximation can be achieved as step sizes approach [math]. This is done for maximizing a (monotone and non-monotone) DR-submodular function over a downwards-closed polytope [7, 6]. This is introduced as an alternative to multilinear extension which is more practical to evaluate, and for which a gradient-based algorithm leads to a approximation over downwards-closed polytopes [31].
As for discrete problems, after the introduction of the multilinear framework, there have been many developments. One highlight is the introduction of contention resolution schemes [17] which allow one to work with more general polytopes. An online version of this approach has also been developed [25] with applications in algorithmic game theory. In a different direction, the work of [26] gives a combinatorial local search approximation algorithm for maximizing monotone submodular functions over matroids.
The diversity maximization problem has proved extremely versatile for many applications, as noted in Section 1. On the algorithmic side, a greedy -approximation was devised in [33] and this was generalized to matroid constraints [10]. The latter was extended to yield a -approximation whenever the diversity costs form a -semi-metric [49]. That is, for all . A PTAS has also been developed when the ’s are negative type distances [16, 15].
There is also work that extends set function maximization beyond submodularity. In [8] a greedy algorithm is shown to give good approximations for a family of set functions which are parameterized by curvature and submodularity ratio values. In [11], a -approximation is developed for the problem of maximizing the sum of a submodular function and a metric diversity function. A generalization of this function, called proportionally submodular functions, is considered in [10]. Another extension is to maximization of weakly submodular functions where non-negativity of the function is relaxed [32].
4 Fractional Approximation.
In this section, we first discuss a key property of one-sided smooth functions, which is the main tool in our analysis. This property asserts that for a point , the directional derivative at points close to is bounded by a factor of the directional derivative at .
We then present a variant of the continuous greedy process which we use for both general -OSS functions and those that have non-positive third-order partial derivatives. We analyze this algorithm for both classes of functions. The discretization of the continuous greedy process is discussed in Appendix C.
4.1 Notations
We use to denote the standard basis of and to refer to the ground set of a set function. We denote the ’th coordinate of a vector with . For a set , we denote by its characteristic vector. Given a vector we denote its support by , i.e., the set of non-zero coordinates of . For a matrix , we use and interchangeably to refer to the entry of .
4.2 A Key Property of One-Sided Smoothness
The following result describes a property of one-sided smoothness that plays a key role in the analysis of the algorithm. It enables us to bound the first order Taylor’s polynomial of the function.
Lemma 1**.**
Let , and such that . Let be a non-negative, monotone function which is one-sided -smooth on . Then
[TABLE]
Proof.
Let . By the Chain Rule we have .
By one-sided -smoothness on , for any ,
[TABLE]
for any . Therefore, using that for all (since ), we have
[TABLE]
We integrate both sides of (1) with respect to . On the left hand side we get
[TABLE]
and on the right hand side we get
[TABLE]
where we use that .
Therefore , and hence Since this holds for any taking the limit yields the desired result. ∎
4.3 Continuous Greedy and One-Sided -Smoothness
We now provide an adaptation of the continuous greedy algorithm, originally introduced in [45]. Algorithm 1 is for maximizing a monotone -OSS function over a polytime separable downward-close polytope. Unlike the classical continuous greedy, our algorithm starts from a non-zero point, which allows us to take advantage of Lemma 1. Because of this, we call our algorithm jump-start continuous greedy.
Theorem 1**.**
Let be a monotone -OSS function. Let and be a polytime separable, downward-closed, polytope. If we run the jump-start continuous greedy process (Algorithm 1) then and where .
Proof.
The main idea of the proof is to show that moving in the direction guarantees a fractional progress equal to . Let be such that . Also, let and , i.e., (where denotes the component-wise maximum operation). We have by Taylor’s Theorem that for some :
[TABLE]
where the last inequality follows from Lemma 1. By the choice of we have that for any , and then since and is non-decreasing in each component (because is always non-negative) we also have
[TABLE]
By the choice of and above inequalities it follows that for any ,
[TABLE]
Let . Then using chain rule, we have
[TABLE]
We solve the above differential inequality by multiplying by .
[TABLE]
Integrating the LHS and RHS of the above equation between [math] and we get
[TABLE]
Hence
[TABLE]
where the last inequality follows from the fact that is non-negative. Substituting and gives the desired result. ∎
In Proposition 4 in Appendix B we provide an explicit expression for the best value of (in terms of ) for Algorithm 1 when we are dealing with -OSS functions.
As discussed in Section 2, if the third-order partial derivatives of are non-postive, then the approximation factor of Algorithm 1 improves to .
Theorem 2**.**
Let be a monotone -OSS function with non-positive third-order partial derivatives. Let be a polytime separable, downward-closed, polytope. If we run the jump-start continuous greedy process (Algorithm 1) with , then and , where .
The main idea for proving Theorem 2 is to use the third-order Taylor’s polynomial and use the non-positivity of third-order partials and the defining property of -OSS functions. More specifically, because the third-order partials are non-positive, we have
[TABLE]
Then using the fact that is large (because we start from a non-zero point), we can conclude that v_{max}(x)\cdot\nabla F(x)\geq\Big{(}\frac{\alpha}{\alpha+\sigma}\Big{)}\Big{(}OPT-F(x)\Big{)}. This inequality is then used to derive the desired result. For details of the proof of Theorem 2, see Appendix B.
Algorithm 1 is a continuous process and in general, it cannot be implemented in finite time. Therefore, we give a discretization of this process. In Appendix C, we show that starting from and using the update rule with the appropriate step size , we can recapture similar approximation factors. We present different results for the discretization which are very similar in nature. The first one asserts that, if is the multilinear extension of some set function , then using , the output of the discrete algorithm satisfies . See Theorem 10 in Appendix C.
The second result states that for a function that satisfies for all and , using , the output of the discrete algorithm satisfies F(x^{1})\geq\Big{(}1-\exp{(-\beta(1-\alpha)(\frac{\alpha}{\alpha+1})^{2\sigma}}\Big{)}OPT. See Theorem 11 in Appendix C. Note that, for example, the functions with a non-negative Hessian satisfy the mentioned inequality with .
5 Rounding
Let be a matroid and be its polytope. In this section we study the integrality gap for a quadratic program: . Here is a non-negative, quadratic multilinear function such that and is a symmetric, zero diagonal matrix.
There are unbounded gaps for such quadratic programmes even for graphic matroids if we allow parallel edges (see Theorem 8). Fortunately these large gaps transpire for a simple reason, namely when the matroids have very small circuits. We are able to obtain the following integrality gap upper bound.
Theorem 3** (Quadratic Integrality Gap over Matroids).**
Let be a set function whose multilinear extension is -OSS . Let be a matroid of rank , minimum circuit size , and matroid polytope . Then there is a polytime algorithm which given produces an integral vector such that .
Combining the continuous greedy methods with this rounding procedure we obtain the following result which improves upon the previous best bound of .
Theorem 4**.**
The problem of maximizing a -semi-metric diversity function over a matroid admits a -approximation. For uniform matroids this becomes a -approximation.
Theorem 3 is obtained by two different rounding algorithms. One is based on modifying the approximate integer decomposition property [17] to work for quadratic programs; the second one adapts the swap rounding algorithm developed for submodular functions [13]. We discuss the first result here. For details regarding the second method, see Appendix D. We remark that while our rounding results are inspired by previous techniques used for submodular maximization, the analysis requires several new insights to make it work for quadratic functions, since these are not convex in the directions.
Theorem 5**.**
Let be a non-negative, quadratic multilinear polynomial and be a matroid with rank and minimum circuit size . If , then there is an independent set of such that .
We actually prove the following decomposition result which implies Theorem 5. For , we define the coverage of a pair to be the quantity . Let be the vector with entries . As is quadratic it is linear in these coverage values and the vector : . For a set we say its coverage set is . A quadratic coverage of is a collection of weighted independent sets with properties (1) for each , , and (2) for each , . Recall that . It follows that and hence if the size , then some satisfies . This bound depends on the fact that entries of are non-negative. By condition (1) of quadratic coverages, we have and by condition (2), . Therefore, for such a collection we have . This reasoning shows that to deduce Theorem 5, it suffices to find a quadratic coverage with .
Theorem 6**.**
Let be a non-negative, quadratic multilinear polynomial and be a matroid with rank and minimum circuit size . If , then it has a quadratic coverage of size at most .
Proof.
We start with an arbitrary representation of as a convex combination of independent sets: .
First note that . Hence an ordered pair contributes to if . This implies that if , then this contributes exactly for every . If , then the unordered pair contributes to coverages as follows. It contributes for every and for each . Here for disjoint node sets we define to be the set of edges which have endpoints in distinct sets from the ’s. Hence we can express the coverage vector for in as:
[TABLE]
We now define a quadratic coverage, that is, a weighted collection of independent sets satisfying conditions (1) and (2). In particular, for each we define a family of independent sets which will take care of all coverages associated with terms in (2). In the case where , this is easy. We just include the set with weight . Now consider the case where which is trickier. For each set in this family, we always associate the weight and so this amounts to finding a family which satisfies
[TABLE]
We return to this construction later but we note that condition (2) will follow easily as long as we guarantee that for each and , if , then the family includes at least one set which contains . Since we have for any such , we derive the desired inequality (2): .
If we can achieve this construction so that for each , then we have a quadratic coverage whose size is . The last inequality follows since the are a convex combination.
We now define for a fixed pair and show how to find the desired independent sets , where is defined later. First, if , then we include the sets . This takes care of the double-coverage of pairs in as well as any pairs with and . Let and . Note that the excess coverage from these sets is to contribute an extra to each pair in . It now remains to cover the edges in .
Let and . Decompose into disjoint independent sets by ripping out sets of size greedily, possibly the last being smaller than . Call these . For each , we extend to an independent set in only adding elements from . Hence this set will have used all elements of except a subset, call it , of size at most . Let and note that and hence it is also independent. We now examine the pairs covered by . Let , then either is covered by , or in which case it is covered by .
Finally, we count the number of sets for a given family. There are two cases depending on whether or not. If the intersection is empty, then we just build . Since , this is at most . In the other case we have , and we add the sets up front and then we add more sets. Hence the overall number of sets in this case is at most .
It follows that , and thus we have a quadratic coverage of size at most , as we wanted to show. ∎
6 Hardness
It is shown that it is hard to approximate the maximum of a metric diversity function subject to a cardinality constraint within a factor better than [5, 11]. We generalize this hardness result to -semi-metric diversity functions. The following result shows that our approximation factor for maximizing a -semi-metric diversity function, subject to a uniform matroid (Theorem 4) is asymptotically tight. For the proof of the following theorem, see Appendix E. Let where is a suitably chosen universal constant independent of .
Theorem 7**.**
Assuming the exponential time hypothesis (ETH): (1) There is no polytime -approximation algorithm for maximizing -semi-metric diversity functions subject to a cardinality constraint, and (2) for any fixed and , there is no polytime algorithm which approximates the maximum of a -semi-metric diversity function subject to a cardinality constraint within a factor of .
Combining Theorem 7 and the rounding for multilinear quadratics subject to a uniform matroid (Theorem 5), gives the following result which states that the approximation bound given in Theorem 2, for the functions with a non-positive third-order partial derivatives, is asymptotically tight.
Corollary 1**.**
Let be a matrix corresponding to a -semi-metric distance function. Then, assuming ETH, it is hard to approximate the continuous problem within a factor of . Moreover this implies that the analysis of the jump-start continuous greedy algorithm in Theorem 2 is asymptotically tight.
This result is conditioned on hardness of densest subgraph which has been established under ETH [38] - see Appendix E. First, since the term in Theorem 3 does not depend on , it yields an rounding gap for cardinality constraints (since ). In addition, given that the multilinear extension of the densest subgraph objective is of the form , the approximability of densest subgraph is within a constant factor of its continuous relaxation.
The following result asserts that our rounding algorithm is also asymptotically tight. The proof is included in Appendix E.
Theorem 8**.**
Let with . There exists a -semi-metric diversity function with multilinear extension , and a matroid with rank and minimum circuit size , where the integrality gap of over the matroid polytope is .
7 Conclusion
There are a number of directions which need exploring. The most immediate are (i) extending the continuous greedy algorithm to non-monotone -smooth functions, (ii) develop rounding methods (such as contention resolution) for one-sided smooth functions over more general polytopes. We believe there should be further interesting applications for the one-sided smoothness model introduced in this work.
8 Acknowledgements
This article benefitted greatly from previous anonymous reviews. We are indebted to those reviewers as well as to Chandra Chekuri, Anupam Gupta and Nick Harvey who also provided invaluable feedback. The third author gratefully acknowledges the support from an NSERC Discovery Grant 109840 without which this work would not be possible.
Appendix A Appendix: Semi-metric diversity and OSS
In this section, we establish the smoothness parameter associated with several of the discrete quadratic functions discussed. In other words, we bound the approximate triangle inequality for their associated distance functions.
Definition 1**.**
Let be a distance function with the corresponding distance matrix where . We say is a negative-type distance if for any with we have .
Proposition 1**.**
Any negative-type distance is -semi-metric.
Proof.
Let . We know
[TABLE]
Therefore and is -semi metric. ∎
Jensen-Shannon Divergence is a function which measures dissimilarity between probability distributions. It is well-known that if is a JS measure, then is a metric. Hence JS distances form a -semi-metric by the following result.
Proposition 2**.**
Let be a distance function such that is a metric. Then is a -semi-metric.
Proof.
By definition, we have
[TABLE]
Therefore,
[TABLE]
We also know that
[TABLE]
Hence,
[TABLE]
∎
Lemma 2**.**
Let , and . If for any we have
[TABLE]
then is one-sided -smooth at .
Proof.
We have
[TABLE]
∎
We have defined a symmetric matrix to be a -semi-metric (see Section 1) if for all . Our main applications are to multilinear extensions where is non-negative and has zero diagonal. However, the following result applies in the more general setting.
Proposition 3**.**
Let be a non-negative symmetric matrix. Let and . Then is one-sided -smooth if is a -semi-metric.
Proof.
Note that and . Therefore,
[TABLE]
where the first inequality follows from and the last inequality holds because is -semi-metric. Now by Lemma 2, we conclude that is one-sided -smooth. ∎
Appendix B Appendix: Jump-Start Continuous Greedy
Proposition 4**.**
For any the best approximation guarantee in Theorem 1 is attained at
[TABLE]
Proof.
We need to find the maximizer of where . Hence, we solve .
[TABLE]
The only solution in is and this yields the proposition. ∎
Theorem 2.
Let be a monotone -OSS function with non-positive third order partial derivatives. Let and be a polytime separable, downward-closed, polytope. If we run the jump-start continuous greedy process (Algorithm 1) then and where . In particular, taking we get and so (since for ).
Proof.
For each we have
[TABLE]
Since is convex and , we have that as long as . Given that each and also , it follows that is a convex combination of points in , and hence belongs to .
Let be such that . Also let and , i.e., . By Taylor’s Theorem and non-positivity of the third order derivatives of we have
[TABLE]
where the second inequality follows from smoothness, and the third from the fact that . Thus
[TABLE]
where the last inequality follows from monotonicity. We also have that
[TABLE]
where the first inequality follows by definition of and the fact that , and the second inequality from the fact that and . Combining this with (5) yields:
[TABLE]
for any . Let us denote . We can use the Chain Rule to get
[TABLE]
where the last inequality follows from (6).
We solve the above differential inequality by multiplying by .
[TABLE]
where the inequality follows from Equation (7).
Integrating the LHS and RHS of the above equation between [math] and we get
[TABLE]
Hence
[TABLE]
where the last inequality follows from the fact that is non-negative. Substituting and gives the desired result. ∎
Appendix C Appendix: Discretization of the Continuous Greedy
We now discuss discretization of the continuous greedy process for one-sided smooth functions.
If our goal is to find a polytime approximation algorithm, we need to establish two features. The first is an approximation bound; for this we use our analysis of the continuous greedy process, Theorems 1 and 2. The second is some sort of smoothness assumption on the gradients of . We consider several conditions for the latter depending on the context; the most straightforward is for multilinear OSS functions.
To discretize the jump-start continuous greedy, we start at (for ) and use the following update rule.
[TABLE]
where is the step size, and .
We always assume is chosen with integer, which is then clearly the number of iterations. Our main concern is to bound this by a polynomial in the input size. This is because we primarily adopt the view that we have exact access to the function and its gradients. This is the case for the ME of a diversity function (and in fact any quadratic function) which is our main application. For more general functions we may not have access to the exact gradient and we should find an estimate by sampling from the function. In that case, we need a probabilistic argument similar to the original argument of Vondrak [45].
There are two ingredients we need to analyze discretizations. One is an approximation bound for the continuous process itself. The second is a bound which guarantees that gradients do not decrease too suddenly. We describe the discretization as a self-contained argument which takes these two bounds ((8) and (9)) as inputs.
Let be a -OSS function and be a downward-closed polytope. Denote . We consider generic lower bounds on the continuous greedy rate of improvement as a function of and . For some , we say an application satisfies a * bound * if for any such that we have
[TABLE]
The following lemma encapsulates the two main bounds we use; these are outcomes of the proofs of Theorem 1 and Theorem 2.
Lemma 3**.**
Let be a -OSS function and be a downward-closed polytope. Denote . Then for any such that we have
[TABLE]
If in addition has non-positive third derivatives, then we have
[TABLE]
For (possibly a function of inputs such as ), we say is -local at if
[TABLE]
for all such that and where . The function is -local if this holds for all such choices of . The next result shows how one may obtain a polytime implementation of continuous greedy for functions with “bounded locality”. As discussed later, in some applications, functions may only be local for a subset of .
Theorem 9**.**
Let be a monotone, non-negative -OSS function and a polytime separable downward-closed polytope. Assume satisfies a bound and is -local. Then taking , discrete greedy produces satisfying:
[TABLE]
Proof.
By definition of the algorithm we have and . Then by Taylor’s Theorem for some we have
[TABLE]
where the first inequality follows from -locality and the second inequality follows by the bound property, and . Now define . We have
[TABLE]
because is non-negative and by choice of . Hence we have
[TABLE]
By induction, we have
[TABLE]
Next, since , we have . Therefore we have
[TABLE]
where the first inequality holds because of non-negativity of , and the last equality holds because .
∎
We discuss how one may apply this theorem to functions with gradients that are -Lipschitz (with respect to norm). It follows that . Define and suppose that for some we have that fails the condition for -locality. That is, and hence . Together with the first inequality this yields: . Hence if satisfies (8) we have
[TABLE]
It follows that . Hence if we follow the analysis in the proof of Theorem 9, either we achieve the claimed multiplicative bound, or we reach a point which is within a small additive constant of opt.
We now apply discretization to our main applications. Note that in some cases, it is enough to have the locality condition on a subdomain of the function. One may show that for non-negative monotone set functions, their multilinear extensions are -local on . This yields the following result.
Theorem 10**.**
Let be a non-negative monotone set function and be its multilinear extension such that is -OSS. Let denote a polytime separable downward-closed polytope contained in . Assume that satisfies some bound. Then the output of the discrete version of jump-start continuous greedy algorithm, with , satisfies
[TABLE]
Proof.
First of all, suppose . By Taylor’s remainder theorem, for some , we have
[TABLE]
Now note that because is the multi-linear extension of , we have
[TABLE]
where — see [45]. Because is monotone, the term is non-negative for any and . Also note that because is downward-closed we have .
Since and , for any we have . Let . Because and , we also have . Therefore
[TABLE]
Now note that because , we have . Therefore we have . Hence, by Bernoulli’s inequality and choice of , we have
[TABLE]
Hence,
[TABLE]
Therefore, defining , we have
[TABLE]
where the second inequality follows from assumption. The last inequality holds because is non-negative and . Hence we have
[TABLE]
Therefore by taking and using induction, we have
[TABLE]
Note that for any , . Also note that . Therefore
[TABLE]
Note that , as . Therefore we have
[TABLE]
where the first inequality holds because of monotonicity, and the last equality holds because . ∎
We may also show that the discrete greedy algorithm achieves -step convergence to the claimed bounds if in addition has an even stronger lower bound on its gradients. (A property effectively saying is [math]-local.) As we see, this property is satisfied for multilinear extensions of supermodular functions.
Theorem 11**.**
Let be a monotone, non-negative -OSS function and a polytime separable downward-closed polytope. Assume satisfies a bound and in addition:
[TABLE]
for all and such that . (We assume and the larger the value of the better). Then -step discrete continuous greedy computes satisfying:
[TABLE]
Proof.
By definition of the algorithm we have and . Then by Taylor’s Theorem for some we have
[TABLE]
where the first inequality follows from lemma’s assumption, and the second inequality follows by the bound property. For the -step version we have and so
[TABLE]
Thus
[TABLE]
where the last step uses the exponential inequality . ∎
Remark 1**.**
Note that for the above approximation factor is equal to , which matches the approximation obtained via the continuous greedy process, i.e., Theorem 1.
Lemma 4**.**
A -OSS function satisfies (11) with if is a copositive matrix for any . In particular, the multilinear extension of a supermodular function has .
Proof.
Using fundamental theorem of calculus, we have . Now the first part of the lemma follows with and taking the inner product with , since for any . For the second part, let be the multilinear extension of a supermodular set function . Then is a multilinear extension of the submodular set function . Vondrak [45] shows that the Hessian of is always non-positive with [math] diagonal. Thus and hence copositive. ∎
The version of discrete greedy for multilinear extensions of supermodular functions may appear too good in that it only requires one step. It has two intensive computational ingredients, however. First is to solve an LP to find a starting iterate . The second is to compute the gradient , which already requires work.
Appendix D Appendix: Swap Rounding for multilinear quadratics
In this section, we analyze a modified version of the swap rounding algorithm (Algorithm 2) and we show that it finds an integral solution which is an -approximation of the initial fractional solution.
First we define the following notation. and and . With an abuse of notation, we show with . The following result provides a decomposition of the multilinear extension of a quadratic function based on the convex decomposition of a point to the bases of the matroid.
Lemma 5**.**
Let where and is a symmetric matrix with for all . Then the multilinear extension of is . Moreover, if for some scalars ’s and subsets , then
[TABLE]
Proof.
For the first part of the lemma note that
[TABLE]
To see the second part, observe that
[TABLE]
and
[TABLE]
∎
Lemma 6**.**
Let be a matroid and be its corresponding base polytope. Let where and is a symmetric matrix such that its diagonal is zero. Let for any . Let where ’s are bases of the matroid, , and , for . Let be the output of MergeBases (defined in Algorithm 2) on and . Let . Then .
Proof.
Let and (the original inputs of the function). Let and be the resulting and after the -th iteration of the while loop. Let . Let be the elements we pick at the -th iteration of the loop. We show that and this yields the desired result using a simple recursion argument. Without loss of generality, we assume
[TABLE]
We have
[TABLE]
The inequality holds because of (D), and the first and the last equalities follow from Lemma 6. The second to the last equality uses that and . ∎
Theorem 12**.**
Let be a matroid of rank and be its corresponding base polytope. Let where and is a symmetric matrix with zero diagonal that satisfies the -semi-metric inequality, i.e., for all . Let for any . Let and be the output of the modified swap rounding (Algorithm 2) on . Then .
Proof.
Let where ’s are bases of the matroid, , and , for . Let be the output of the swap rounding (Algorithm 2) if it starts from and . Let denote the vector corresponding to and , i.e. . By Lemma 6, for , we have
[TABLE]
where . Therefore
[TABLE]
where the last inequality holds since . Now, we bound the term . By definition of , note that . Using this and Lemma 5 it follows that
[TABLE]
By Lemma 6 and the -semi-metric assumption, we also know that
[TABLE]
Note that none of the edges of is present in the right hand side summation. Therefore
[TABLE]
where the second inequality follows from Lemma 5 and the last inequality holds because of Lemma 6. Combining (16), (17), and (D), we get
[TABLE]
Hence, by (D) and (19), we have
[TABLE]
and this yields the result. ∎
Appendix E Appendix: Hardness of Approximation for -Semi-Metric Diversity
In this section, we provide a hardness result for approximate maximization of -semi-metric diversity functions defined on a semi-metric distance. Our results are based on inapproximability results for finding densest subgraphs.
Given a graph and integer , the densest -subgraph problem aims to find an induced subgraph of size with the maximum number of edges. Let be a subset of vertices of and be the number of edges in the induced subgraph of . The density of is defined as . A recent breakthrough [38] shows that, assuming the exponential time hypothesis (ETH), there is no subpolynomial approximation algorithm for densest subgraph. More precisely, there is no polytime algorithm which can distinguish between two cases: (i) an instance which contains a -clique and (ii) an instance where the density of every -subset satisfies , where is a universal constant independent of . In the following, we let . Existence of constant-factor approximations had previously been ruled out under the unique games conjecture with small set expansion [41].
Theorem 7.
Assuming ETH: (1) There is no polytime -approximation algorithm for maximizing -semi-metric functions subject to a cardinality constraint, and (2) for any fixed and , there is no polytime algorithm which approximates the maximum of a -semi-metric function subject to a cardinality constraint within a factor of .
Proof.
For , we can reduce the densest -subgraph problem to -semi-metric function maximization in the following way. Consider an instance of densest -subgraph on graph with vertex set . Create a distance function . If there is an edge between in , set ; otherwise set . It is easy to see that this distance function is -semi-metric. Let . If , we have
[TABLE]
We know . Therefore
[TABLE]
and dividing both sides by we get
[TABLE]
It is also easy to see that
[TABLE]
Suppose there is a -approximation algorithm for maximizing -semi-metric functions. Let its output on be and choose
[TABLE]
We have
[TABLE]
We can choose our so that . Hence . If is a graph in which the density of every subset of vertices of size is at most , then clearly . If is a graph that contains a clique of size , then , and so . This means that our -approximation algorithm can distinguish between these two graphs, contradicting the implications from [38].
For (2), consider a given and suppose there is a -factor approximate algorithm for maximizing a -semi-metric function. Denote its output on by , and let be defined as above. We then have
[TABLE]
Set , and note that is a constant. If is a graph in which the density of every subset of vertices of size is at most , then clearly and this is at most for sufficiently large. If is a graph that contains a clique of size , then which means . This means that our -factor approximate algorithm can distinguish between these two graphs which again contradicts the implications of [38]. ∎
Theorem 8.
Let with . There exists a -semi-metric with multilinear extension , and a matroid with rank and minimum circuit size , where the integrality gap of over the matroid polytope is .
Proof.
Let for , and . We define a matroid in terms of its circuits as follows. A set is a circuit of if and only if is the union of any sets . It is then clear that the minimum size of a circuit is , and the rank of the matroid is . For example, could be the graphic matroid corresponding to the graph in Figure 2. Circuits here correspond to cycles of size , and the dashed lines show the non-zero coefficients of .
Let . It is straightforward to see that is the multilinear extension of a -semi-metric diveristy function induced by a complete graph which has weight on edges from and weight otherwise.
By definition of and , it is clear that any integral solution maximizing will pick pairs from and then singletons from other pairs. Therefore
[TABLE]
On the other hand, and
[TABLE]
Using that and we have
[TABLE]
where the last inequality follows since . Hence, It follows that the integrality gap is at least
[TABLE]
∎
Appendix F Appendix: One-Sided Smoothness versus Lipschitz Smoothness
Lipschitz smoothness is an important, widely-used property in convex optimization and machine learning. One-sided -smoothness is different from Lipschitz smoothness (and other smoothness notions based on Holder’s or uniform continuity) and we believe it may also have applications to these areas.
A differentiable function is Lipschitz smooth if its gradient is Lipschitz continuous. In other words, is Lipschitz smooth if there exists such that for any and , or equivalently for twice differentiable functions, We then call -Lipschitz smooth. One could define the one-sided version of this smoothness if the above inequality holds for any (second definition/inequality holds for any ). With this definition, it is easy to see that submodular functions are one-sided [math]-Lipschitz smooth. On the other hand one-sided -smoothness is not equivalent to one-sided -Lipschitz smoothness. To see an important difference, consider function where is a constant and is one-sided smooth. We have . Thus if is one-sided -Lipschitz smooth we may only assert that is one-sided -Lipschitz smooth. In particular, Lipschitz smoothness is not closed under multiplication. On the other hand, the one-sided -smooth functions form a cone. Intuitively, the reason is that in -smooth functions, the ratio of the gradients is bounded (as shown in Lemma 1) unlike Lipschitz smoothness where the difference of the gradients is bounded.
Appendix G Appendix: Other Applications
G.1 Appendix: The Diversified Procurement Problem
Consider a problem whereby an organization decides how to outsource the building or servicing of a system to a collection of competing vendors. The outcome is an allocation of work across the vendors. An allocation is represented by a vector ; we focus on the case where . Possibly is just but may also incorporate structural constraints imposed by the system or to enforce a resilience solution (e.g., avoid allocations where one vendor becomes too big to fail). Given bids for each , the payoff to the organization is . A different type of consideration for the procuring organization is to build diversity into the work-plan which results. We consider two sources for lack of diversity. First, there may be collusions and these are to be subdued. The organization can define a matrix which estimates pairwise collusions. A solution which lessens the value of is more desired. Second, the system may be serving a collection of stakeholder communities. Different vendors may be more desirable than others to distinct communities. Again, the procuring organization can model this by defining vectors , where represents the level of support (positive or negative) it receives from community . The overall measure of quality seeks a solution which promotes representation across more communities (vectors with are pointing in different directions; hence good). We propose the following model to address this multi-criteria objective:
[TABLE]
where is the matrix whose columns are . Hence consists of a revenue part and a penalty part for lack of diversity.
It is easily seen that is copositive and hence is [math]-OSS — see Section 1.1.555This also leads to examples of [math]-OSS functions whose Hessians have positive off-diagonals. For instance, by taking and select vectors which are pairwise oblique (i.e., ). Hence the jump-start continuous greedy process of Section 4.3 can be applied if the model has been defined so that (since is normalized this ensures non-negativity). Checking this gradient condition is easy. Moreover, it is useful in the modelling phase for exploring the trade-offs between the vector of bids and the community representation objectives which are determined by .
Note that Theorem 2 implies that the continuous greedy process produces a solution which is within a factor of the fractional optimum of . We may also create a (weakly) polytime approximation as follows.
We assume that the model has been constructed so that the bids dominate the diversity penalties. Concretely we assume that . We also let be an upper bound on the entries of , i.e., . We now examine for which values is the function -local — see paragraph before Theorem 9. That is, we want:
[TABLE]
for any , . This holds if we have:
[TABLE]
By re-arranging this holds as long as
[TABLE]
We now use the fact that , where is the maximum value of for an eigenvalue of . By the Gershgorin Circle Theorem [29] is at most . Hence which is at most if . By our gradient assumption, the right hand side of (21) is then at least . Note that we only need to establish -locality for vectors selected by the greedy process. Since these vectors lie in which is in the unit hypercube, we have the desired inequality. Hence we may choose and a discretization follows from Theorem 9.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. Diversity maximization under matroid constraints. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013 , pages 32–40, 2013.
- 2[2] Francis Bach. Submodular functions: from discrete to continuous domains. Mathematical Programming , 175(1-2):419–459, 2019.
- 3[3] Rafael da Ponte Barbosa, Alina Ene, Huy L Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS) , pages 645–654. Ieee, 2016.
- 4[4] Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vijayaraghavan, and Yuan Zhou. Polynomial integrality gaps for strong sdp relaxations of densest k-subgraph. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete algorithms , pages 388–405. SIAM, 2012.
- 5[5] Aditya Bhaskara, Mehrdad Ghadiri, Vahab S. Mirrokni, and Ola Svensson. Linear relaxations for finding diverse elements in metric spaces. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain , pages 4098–4106, 2016.
- 6[6] An Bian, Kfir Levy, Andreas Krause, and Joachim M Buhmann. Continuous dr-submodular maximization: Structure and algorithms. In Advances in Neural Information Processing Systems , pages 486–496, 2017.
- 7[7] An Bian, Baharan Mirzasoleiman, Joachim M Buhmann, and Andreas Krause. Guaranteed non-convex optimization: Submodular maximization over continuous domains. Proceedings of Machine Learning Research , 54:111–120, 2017.
- 8[8] Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages 498–507. JMLR. org, 2017.
