Submodular maximization and its generalization through an intersection cut lens
Liding Xu, Leo Liberti

TL;DR
This paper introduces a novel intersection cut approach for submodular maximization problems, utilizing a convex extension and a hybrid algorithm, leading to improved solutions for various combinatorial optimization tasks.
Contribution
It develops a new intersection cut framework based on a convex extension of submodular functions, with an efficient hybrid algorithm, and extends the method to nonconvex models like submodular-supermodular functions.
Findings
Effective intersection cuts improve MIP solutions for submodular problems.
The hybrid discrete Newton algorithm computes cuts efficiently and exactly.
Applications include max cut, pseudo Boolean maximization, and Bayesian D-optimal design.
Abstract
We study a mixed-integer set arising in the submodular maximization problem, where is a submodular function defined over . We use intersection cuts to tighten a polyhedral outer approximation of . We construct a continuous extension of , which is convex and defined over the entire space . We show that the epigraph of is an -free set, and characterize maximal -free sets including the epigraph. We propose a hybrid discrete Newton algorithm to compute an intersection cut efficiently and exactly. Our results are generalized to the hypograph or the superlevel set of a submodular-supermodular function, which is a model for discrete nonconvexity. A consequence of these results is intersection cuts for Boolean multilinear constraints. We evaluate our techniques on max cut, pseudo Boolean…
| Configuration | Default | Submodular cut | Split cut | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| closed | time | closed | relative | time | cuts | closed | relative | time | cuts | |
| standalone | 0.04 | 5.13 | 0.16 | 4.40 | 85.40 | 207.59 | 0.12 | 2.93 | 17.92 | 92.53 |
| embedded | 0.22 | 12.62 | 0.27 | 1.22 | 104.02 | 70.68 | 0.27 | 1.21 | 34.62 | 45.15 |
| Configuration | Default | Submodular cut | Split cut | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| closed | time | closed | relative | time | cuts | closed | relative | time | cuts | |
| standalone | 0.01 | 9.49 | 0.05 | 4.81 | 43.54 | 43.17 | 0.03 | 2.31 | 14.64 | 20.94 |
| embedded | 0.105 | 22.52 | 0.11 | 1.13 | 49.61 | 13.80 | 0.106 | 1.01 | 25.58 | 28.21 |
| Benchmark | Configuration | Default | Submodular cut | Split cut | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| closed | time | closed | relative | time | cuts | closed | relative | time | cuts | ||
| Block design | standalone | 0.59 | 20.46 | 0.59 | 1.0 | 18.71 | 1.84 | 0.59 | 1.0 | 11.62 | 1.77 |
| embedded | 0.59 | 21.44 | 0.59 | 1.0 | 19.0 | 1.84 | 0.59 | 1.0 | 12.41 | 1.77 | |
| Gaussian | standalone | 0.83 | 213.13 | 0.83 | 1.0 | 415.07 | 1.45 | 0.83 | 1.0 | 214.17 | 1.45 |
| embedded | 0.83 | 214.77 | 0.83 | 1.0 | 426.33 | 1.45 | 0.83 | 1.0 | 214.14 | 1.45 | |
| All | standalone | 0.75 | 98.47 | 0.75 | 1.0 | 149.54 | 1.57 | 0.75 | 1.0 | 82.6 | 1.55 |
| embedded | 0.75 | 100.47 | 0.75 | 1.0 | 153.01 | 1.57 | 0.75 | 1.0 | 84.31 | 1.55 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Advanced Graph Theory Research · Infrastructure Maintenance and Monitoring
Submodular maximization and its generalization through an intersection cut lens
Liding Xu LIX CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, 91128, France. E-mail: [email protected], [email protected]
Leo Liberti 11footnotemark: 1
Abstract
We study a mixed-integer set arising in the submodular maximization problem, where is a submodular function defined over . We use intersection cuts to tighten a polyhedral outer approximation of . We construct a continuous extension of , which is convex and defined over the entire space . We show that the epigraph of is an -free set, and characterize maximal -free sets including . We propose a hybrid discrete Newton algorithm to compute an intersection cut efficiently and exactly. Our results are generalized to the hypograph or the superlevel set of a submodular-supermodular function, which is a model for discrete nonconvexity. A consequence of these results is intersection cuts for Boolean multilinear constraints. We evaluate our techniques on max cut, pseudo Boolean maximization, and Bayesian D-optimal design problems within a MIP solver.
Keywords: MINLP, submodular maximization, submodular-supermodular functions, intersection cuts, Boolean multilinear functions, D-optimal design.
1 Introduction
In this paper, we consider the submodular maximization problem:
[TABLE]
where is a submodular function and is a set describing additional constraints. We study valid inequalities for the mixed-integer set , which is the hypograph of over the Boolean hypercube .
The maximization of arbitrary submodular functions (i.e., Eq. (1)) can be reduced to a Mixed-Integer Linear Program (MILP) with exponentially many linear inequalities [60]. No polynomial-time algorithm is yet known to separate these inequalities. The Benders-like exact approach based on a branch-and-cut algorithm proposed in [27] provides global dual bounds for primal solutions, and achieves a finite convergence rate.
Many submodular maximization problems (e.g., max cut with positive edge weights [66], D-optimal design [64], and utility maximization [2]) have natural MILP or mixed-integer nonlinear programming (MINLP) formulations, which can be solved using general-purpose global optimization solvers. The algorithm underlying these solvers is typically a branch-and-cut algorithm, which uses polyhedral outer approximations to construct LP relaxations [14, 15, 70]. For submodular maximization problems with convex MINLP formulations, a state-of-art algorithm also uses polyhedral outer approximations [23].
Intersection cuts can be used to strengthen polyhedral outer approximations of a nonconvex set that is considered hard to optimize over. The construction of intersection cuts [24] requires two key ingredients: a corner polyhedron relaxation of , and an -free set, which is a convex set that does not contain any interior point of . (Inclusion-wise) maximal -free sets generate strong intersection cuts not dominated by other intersection cuts.
Intersection cuts were initially devised in the continuous setting (the papers [73, 74], cited in [46, Ch. III], appeared before the classic paper [9]), where they could approximate the hypograph of a convex function over a polytope. There is a unique maximal -free set: the epigraph of that convex function. Later, intersection cuts were used in the discrete setting [9], where is a lattice. Several more families of lattice-free sets (e.g., splits, triangles, and spheres [24, 51]) were described later.
The submodular maximization problem plays an intermediate role between these settings. On the one hand, the submodular function is defined over the Boolean hypercube . Therefore, the graph of projected on is a subset of a lattice. On the other hand, as a discrete analogue to convex functions, has a convex (thus continuous) extension over the hypercube , namely the Lovász extension [52]. We can extend the Lovász extension to a convex function, which we call , over the entire -dimensional Euclidean space . This (continuous) function inherits a rich combinatorial structure from .
The difference of two submodular functions (call them ) is a submodular-supermodular (SS) function. SS functions generalize submodular functions, which are also discrete analogues of difference-of-convex (DC) functions. The epigraphs/superlevel sets of DC functions can represent various nonconvex sets, e.g., quadratic sets [57] and signomial-term sets [77]. This representation facilitates the derivation of intersection cuts [67]. In fact, SS functions may represent some discrete nonconvex functions arising in combinatorial optimization. For example, we will show that any Boolean multilinear function is an SS function.
Let . In this paper, we use convex extensions in order to construct some -free (namely, hypograph-free) sets with Boolean structure. The hypograph set is a special case of the constraint set with . We also aim at extending our results to this more general set . Finally, we propose an efficient algorithm to compute intersection cuts derived from -free sets. To the best of our knowledge, intersection cuts have not been applied directly to approximate problems with submodular and/or supermodular structures.
We implement intersection cuts within the SCIP solver [14] and test them on max cut, pseudo Boolean maximization, and Bayesian D-optimal design problems. We show the strengths and weaknesses of intersection cuts under these different settings.
1.1 Contributions
Our primary contribution is the construction of hypograph-free sets. We show that a maximally hypograph-free set can be lifted from a maximal -free set. We also give an alternative construction of hypograph-free sets by exploiting the submodularity. We relate the analytical properties of to its combinatorial properties, which inherit those of the Lovász extension. We show that the epigraph of is a hypograph-free set, larger than the epigraph of the Lovász extension. However, unlike in the continuous setting, is not maximally hypograph-free. We give necessary and sufficient conditions on maximal hypograph-free sets that include .
The second contribution is the computation of intersection cuts. We reduce the intersection cut separation problem to solving univariate nonlinear equations, which we achieve by a hybrid discrete Newton algorithm like [41]. We show that facets of can be separated in strongly polynomial time. This implies that the (sub)-gradients required by the Newton algorithm can be computed in a strongly polynomial time. The hybrid discrete Newton algorithm finds a zero point of a univariate nonlinear equation in a finite number of steps. By contrast, the conventional bisection algorithm only guarantees -approximated solutions for .
Lastly, we extend the previous findings to constraint sets involving an SS function. We show that any Boolean multilinear function is an SS function. This result yields intersection cuts for multilinear constraints in binary variables.
1.2 Literature review
The work of Jack Edmonds [36] plays a prominent role in the study of the combinatorial properties of submodular functions. We refer to [66] for basic concepts and definitions. The convex envelope of a submodular function is its Lovász extension [8, 52]. Submodular functions are a subclass of discrete convex functions, and we refer to [58] for more details about discrete convex analysis.
Valid inequalities for the hypographs of general submodular functions are called the base inequalities [60]. For a class of special submodular functions, lifting procedures [2, 69] can strengthen the base inequalities. The base inequalities can be separated either using heuristics [2] or at integer points in a Benders-like framework [27]. The method defined in [8] combines valid inequalities for the submodular and supermodular components of an SS function. We refer to [6, 7, 17, 18, 44, 48, 61, 68, 77, 78] for more details about the exploitation of submodular/supermodular functions in mathematical programs. Supermodular polynomials in binary variables are defined and studied in [17, 61]. The submodularity of the D-optimal design problem is exploited in [63, 68].
As mentioned above, intersection cuts generate valid inequalities for sets that are hard to optimize over. Gomory introduced the corner polyhedron [42], and his celebrated mixed-integer cuts [43] are special intersection cuts derived from split disjunctions [59]. The definition of intersection cuts for arbitrary set is due to [35, 40]. We refer to [3, 4, 10, 12, 25, 26, 28, 34, 35, 62] for a more in-depth analysis. The method given in [72] can generate valid inequalities that cut off points outside -free sets. We refer to [4, 13, 49, 50, 54, 55] for relevant recent developments in mixed-integer conic programming.
For the cases where the nonconvexity of is not just due to integer variables, we refer to [37] for bilevel programs, [16] for outer-product sets, [57, 56] for quadratic constraint sets, [77] for signomial-term sets, and [38] for bilinear sets. The method given in [67] constructs intersection cuts for sets arising from factorable programs that contain DC functions [47].
Next, we discuss valid inequalities for polynomial programming, because we use polynomial programs in binary variables as a benchmark in our computational study. In [16], intersection cuts approximate a nonconvex lifted set, namely the outer product set arising from the extended formulation of a polynomial program. Lifted sets link decision variables to auxiliary variables representing (graphs of) monomials up to a given degree. We remark that in most combinatorial optimization problems, decision variables are binaries. The polynomial program of interest is then a Boolean Multilinear Program (BMP). The corresponding lifted set is the Boolean multilinear set [29, 39], the convex hull of which is the so-called Boolean multilinear polytope. Valid inequalities for the Boolean multilinear polytope may be stronger than those for the convex hull of the outer product set. Various Gomory-Chvátal-based inequalities [30, 31, 32, 33] are valid for the multilinear polytope. The separation and strength of these inequalities depend on the hypergraph representing the underlying sparsity pattern of the multilinear set.
We consider a constrained polynomial program, and assume that some of its constraints are neither integrality constraints nor variable bound constraints. After lifting, those constraints are linear constraints and thus define a convex set . The lifted set is nonconvex, and . The polynomial program is then equivalent to linear optimization over . However, in general, , so the convexification of the lifted set may not yield an equivalent convex problem. To address this issue, one attempt is to directly consider and generate valid inequalities for it. Some work in this sense exists for certain interesting special cases, e.g. the intersection of multilinear sets with additional constraint sets such as cardinality constraints [20]. Another attempt is to consider constraints in projected formulations, e.g., in mixed-integer quadratically constrained quadratic programs [65]. Since the representation complexity of the projected formulation is smaller than that of the extended formulation, this approach is also amenable to computation. In [21, 57], intersection cuts for the set defined by a quadratic constraint are derived. If additionally, the nonbasic variables of the LP relaxation are integer, the monoidal technique [22] can strengthen such intersection cuts.
However, generating valid inequalities for Boolean multilinear constraints, and, more generally, constructing -free sets for nonlinear constraints on discrete variables, remain problems of considerable interest. In this paper, we look at these questions through a “submodularity lens”.
1.3 Notation
We let for any positive integer . We denote , . We assume that is equipped with the natural number order. denotes the all-one vector, and denotes the all-zero vector. For , we denote by the characteristic vector of . For vectors , we let be their concatenation, and extend this notation naturally to the case where is a scalar. Given a set and a function , we adopt the usual notation to denote the epigraph, graph and hypograph of over , respectively. For example, . When is omitted in the subscript, it is assumed to be . For any set , we denote by , , its boundary, extreme points, interior, respectively. When is not full-dimensional, denote its relative interior and relative boundary.
1.4 Outline
The rest of the paper is organized as follows. In Sect. 2, we recall some preliminaries for intersection cuts. In Sect. 3, we study extensions of submodular functions. In Sect. 4, we study the hypograph-free sets for the submodular function. In Sect. 5, we generalize the previous results for sets involving SS functions. In Sect. 6, we consider applications for intersection cuts to Boolean multilinear constraints and Bayesian D-optimal design. In Sect. 7, we propose the hybrid discrete Newton algorithm for computing intersection cuts. In Sect. 8, we analyze the computational results.
2 Intersection cut preliminaries
In this section, we review the basic concept of intersection cuts. Assume that we are solving the optimization problem . Given a polyhedral outer approximation of a nonconvex set , an LP relaxation is . Then an optimal relaxation point is a vertex of . An intersection cut is a particular cut that separates from .
Definition 1**.**
Given , a closed set is called (convex) -free, if is convex and .
The construction of intersection cuts [24] needs two components: a corner polyhedron relaxation of with apex ( can be extracted from ), and an -free set containing in its interior. Then an intersection cut separates from (a set which, we note, contains ) as follows. The half-space and ray representations of the corner polyhedron is as follows:
[TABLE]
where is a -by- invertible matrix, is the -th column of and an extreme ray of .
Define the step length
[TABLE]
The point is separated by an intersection cut
[TABLE]
Let be two -free sets such that . Then the intersection cut derived from dominates the intersection cut derived from . This makes maximal -free sets relevant in the study.
3 Extensions of submodular functions
In this section, we study continuous extensions of submodular functions. W.l.o.g., we assume in the sequel that for any submodular function , holds (by a translation of a constant). It is known that the Lovász extension [52] extends from to . Based on this extension, we construct another extension of defined over the entire space , and study its analytical and combinatorial structures.
We first look at some polyhedra associated with the submodular function [8, 66]. Its extended polymatroid is defined as
[TABLE]
its convex hull of the epigraph over is defined as
[TABLE]
Recall that are the vertices of , and we can further define a polyhedron
[TABLE]
In fact, includes , because of the following lemma:
Lemma 1** ([8]).**
**
Therefore, defines trivial facets of , and non-trivial facets of are , where is a vertex of .
These polyhedra in turn give rise to some functions associated with . A convex function is a convex underestimating function of over , if for all , . The convex envelope is thus the maximal convex underestimating function of over . Since is the epigraph of , by Lemma 1,
[TABLE]
where its domain is . We remark that is equivalent to the Lovász extension of [8]. We will show that the cardinality is not polynomial to . Thereby, when computing , it is inefficient to evaluate all for . In fact, the value and the (sub)-gradients of on can be computed in a strongly polynomial time [8].
We define the envelope of extended to as
[TABLE]
We note that simply enlarges the domain of from to . This extension is algebraically simple, but analytical properties of outside will be studied in further detail. We find that is the epigraph of , i.e., , so is a convex function. Since every facet of is in one-to-one correspondence to a linear underestimator function of , we call the extended envelope epigraph.
A fundamental question on is how to compute its value and (sub)-gradients efficiently, because this is crucial in constructing intersection cuts. Since the Lovász extension is a restriction of on the hypercube , we can compute efficiently (in a strongly polynomial time) on . In the following, we will show how to extend this method to compute over the entire space . This extension requires us to study the properties of and .
We first look at combinatorial structures associated with the facets of . Recall that a permutation on is a bijective map from to itself. The map is the image of an element under this permutation. We denote by the set of permutations on . We define the following sets and vectors related to permutations.
Definition 2**.**
Given a permutation and an integer , define (), and define . Define the map such that it satisfies for all and for all .
The set of vertices is the image of under the map .
Lemma 2** ([36]).**
.
Every permutation induces a vertex of through the map , so the cardinality of is (not polynomial to ). The above lemma shows that every facet of (a non-trivial facet of ) is given as , and every linear underestimator of is given as .
Proposition 1**.**
Given a permutation , for all , the facet-defining inequality is supported by \bigl{(}v^{i}(\pi),f(v^{i}(\pi))\bigl{)}, i.e., .
Proof.
[TABLE]
where the first equation follows from Defn. 2, the second equation follows from Lemma 2, and the last two equations follow from the expansion of the sum. ∎
Conversely to Prop. 1, given a point in the graph of , we can construct all the facets supported by it.
Corollary 1**.**
For a point , let be the number of ones in . If a permutation satisfies that , then supports the facet-defining inequality of .
At the moment, we find that one can easily obtain the facial structure of from that of . We ask how to separate facets of . Since is the epigraph of , the shape of is determined by , so it suffices to look at .
From a convex analysis perspective, the nonsmooth polyhedral function is the maximum of a set of linear functions, so it is convex and positive homogeneous of degree 1. This means that is subdifferentiable [45]. Moreover, has the following analytical properties.
Proposition 2**.**
For all and all , and . Moreover, .
Proof.
As , is the maximum of a set of linear functions. This implies that it is positive-homogeneous of degree-1 and convex, and it is easy to show the other results. ∎
Given , the evaluation of is called the extended polymatroid vertex maximization problem, as by definition equals
[TABLE]
By Prop. 2, an optimal solution is a subgradient of at (i.e., . By Lemma 2, , so (5) asks for a permutation that maximizes . One of the main findings in this section is an algorithm to solve (5).
To tackle (5), we look at a related relaxed problem, namely the extended polymatroid maximization problem, that is well studied:
[TABLE]
If , a strongly polynomial time sorting algorithm can solve the extended polymatroid maximization [36]: Let be a permutation such that , then an optimal solution to (6) is .
We note that the vertices are a finite set, so (5) is always bounded; is the Minkowski sum of and a set of recession rays, so is unbounded. This means that (6) can be unbounded.
Lemma 3**.**
When , the optimum of (6) must be a vertex, and (5) is equivalent to (6); when has some negative entries, (5) is unbounded and is not equivalent to (6).*
Therefore, (5) is not equivalent to (6) for . However, we can show that the sorting algorithm also solves the problem (5) for any case.
Proposition 3**.**
The output of the sorting algorithm is optimal to the extended polymatroid vertex maximization problem (5).
Proof.
Let be the permutation found by the sorting algorithm. By Lemma 2, is in and hence a feasible solution to (5). Next, we prove the optimality of . Let , then the translated vector . The following inequalities hold:
[TABLE]
It is easy to show that . As , by Lemma 3 and the sorting algorithm, . It follows from Prop. 1 that . As the entries of are identical, for any , . Therefore, for any , , so . Looking at the inequalities (7), they become equations, because
[TABLE]
Therefore, is an optimal solution to (5). ∎
Given , the sorting algorithm outputs a permutation on it. The sorting algorithm is translation-invariant, i.e., translating each entry of by the same value does not change the output permutation. A byproduct of Prop. 3 is that the translation invariance implies the ray-linearity of .
Corollary 2**.**
Let , then is linear on w.r.t. .
We look at the boundary of . By Prop. 1 and Cor. 1, for all , the point supports some facets of .
Theorem 1**.**
.
Proof.
We consider a point and look at the line . It can be separated into the restricted epigraph and the restricted hypograph , as and . First, we know that, by definition of and Lemma 1, . Second, by Prop. 1, the point supports some facets of , so the point with is separated by these facets from . Thereby, we know that . To summarize, we know that and . As , we have that . As the hypograph (union of restricted hypographs), we have that . ∎
As already mentioned, is convex and , so is also a continuous extension of . As includes , further extends (the Lovász extension).
We now understand the facial structure of , which will help us construct hypograph-free sets. We also know how to compute the value and subgradients of at any point in , which is important for constructing intersection cuts.
4 Hypograph-free sets for submodular functions
In this section, we construct two types of hypograph-free sets for the submodular function .
First, we show that one can lift a maximal -free set into a maximal -free set.
Theorem 2**.**
Let be an arbitrary function, and let be a maximal -free set in . Then is a maximal -free set.
Proof.
We note that . It is easy to show that is -free, since . Assume that there exists a -free set including . Then the recession cone of must include that of , so for some closed convex set including . Moreover, must be a -free set, otherwise, there exists a point such that . However, since is maximally -free, this implies that . As a result, , so is maximal. ∎
This construction does not rely on any structure of , as it just lifts a -free set. For any , the simple lifted split is a maximal -free set. We next construct -free sets using the submodularity, for both theoretical and computational interests.
We show that the extended envelope epigraph is a hypograph-free set.
Proposition 4**.**
* are -free sets.*
Proof.
Since , we conclude that and hence . Additionally, is convex and hence -free. As , is -free set. ∎
It is known that for a convex function, its maximal hypograph-free set is its epigraph. However, for the submodular function , we will show that its extended epigraph is not a maximal hypograph-free set. A high-level way to test the maximality of is as follows. The set is the convex hull of . Geometrically, is the “minimal” convex set including . This is a conflict as we aim to obtain an inclusion-wise “maximal” -free set. We can remove some facets from and thus enlarge this polyhedron. After removing trivial facets of , the enlarged polyhedron is the extended envelope epigraph . However, this enlargement is still not enough.
We look at a concrete characterization of the “correct” enlarging of . The following fundamental theorem gives a sufficient and necessary condition on (maximal) hypograph-free sets including .
Theorem 3**.**
Let be a full-dimensional closed convex set in including . Then is a -free set if and only if is -free. Moreover, is a maximal -free set if and only if is a polyhedron and there is at least one point of in the relative interior of each facet of .
Proof.
We note that by Thm. 1, . Thereby, (i.e., is -free) if and only if .
We consider the -freeness first. We prove the forward direction. Assume that is a -free set. Suppose, to aim at a contradiction, that there exists a point . Then there exists a sufficiently small such that , but , which leads to a contradiction. We prove the reverse direction. Assume that is -free. Suppose, to aim at a contradiction, that there exists a point with and . As, for some , , by convexity of , , which leads to a contradiction. This implies that is -free if and only if -free (or ).
We consider the maximality next. Due to [11], a full-dimensional lattice-free set is maximal if it is a polyhedron and there is at least one lattice point in the relative interior of each facet. As is a finite set, the proof strategy is similar although it is not a subset of any lattice. Then the result follows. ∎
The above theorem is purely geometrical. Since submodular functions are combinatorial objects, we translate this theorem into a combinatorial language. We first define a combinatorial object in the Boolean hypercube .
Definition 3**.**
Let be distinct points of . They are called monotone, if . We call the corresponding ordered set a monotone chain in .
Therefore, we use a monotone chain to represent a set of monotone points. Then we have the following observation.
Proposition 5**.**
The set of monotone chains is in one-to-one correspondence to the set of permutations via the map defined as follows: for all , .
Proof.
By Prop. 1, since , by Defn. 2, , so is a monotone chain. Conversely, given a monotone chain , its inverse map exists and satisfies that ; and for all , is the index of the unique non-zero entry of . ∎
We find that permutations and monotone chains are indeed equivalent. We note that any distinct points from are affinely independent in and hence support a hyperplane in . Thereby, we can infer from Prop. 1 and Prop. 5 that
Corollary 3**.**
If is a monotone chain in , then distinct points of define (or support) a facet of the extended envelope epigraph .
We say that this monotone chain induces the facet. In fact, we find that facets of , permutations on , and monotone chains in are in one-to-one correspondence. Therefore, we can view them as the same objects. Especially, Prop. 5 relates permutations and monotone chains. We give the following characterization of permutations on .
Definition 4**.**
A subset of permutations of is called a cover, if ; moreover, is called a minimal cover, if additionally, for all , is not empty.
We want to enlarge by removing its facets, this is equivalent to removing permutations from . Let be a subset of permutations of , and denotes the relaxation of the extended envelope epigraph induced by . It is obvious that for any . The following corollary translates Thm. 3 in a combinatorial language.
Corollary 4**.**
Let be a subset of permutations of . is -free if and only if is a cover. is maximally -free if and only if is a minimal cover.
Proof.
First, we note that , as a relaxation of includes . Next, we assume that is a cover. Then points of support facets of . By Thm. 3, is -free if and only if it is a cover. Finally, is a minimal cover, if and only if then each facet of has a point of in its interior. By Thm. 3, the later is equivalent to that is maximally -free. ∎
We now can disprove the maximality by a counter-example. Thanks to the Cor. 4, we can use a counting argument to show that we can remove facets from . This results in a new enlarged -free polyhedron.
Proposition 6**.**
* is not maximally hypograph-free.*
Proof.
It suffices to find a counter-example. Consider , , there are 6 permutations, and 6 monotone chains (see Fig. 1). We assume that, in a non-degenerate case, the associated extended envelope epigraph has 6 facets induced by 6 chains respectively. The vertices and are visited by all the chains, while the other vertices are visited twice each. Therefore, a chain cannot “exclusively” visit a vertex, so the corresponding facet cannot contain one point of in its relative interior. In fact, we can remove some facets from the extended envelope epigraph. We keep three chains:
[TABLE]
These chains induce 3 facets such that at least one point of is in the relative interior of each facet and each point of is in these 3 facets, so the polyhedron defined by these 3 facets is a -free set larger than .
∎
We explain the hardness to enlarge . We build a bipartite graph . An edge of connects a vertex to a permutation if . Then, a minimal cover is a subset of such that i) each vertex of is incident to at least one permutation in ; ii) each permutation in is incident to a vertex of that no other permutation in is incident to. As and , the size of such a graph is not polynomial to . Therefore, one may need additional structural information to enlarge efficiently.
We relax the submodular maximization problem (1) via a polyhedral outer approximation of . Let be the orthogonal projection of on -space. We remark that, within a branch-and-cut algorithm, might be within a low-dimensional face of . Let be a solution to the LP relaxation . We assume that , otherwise, is already an optimal solution to (1). The polyhedral outer approximation gives rise to a piece-wise linear concave overestimating function of over : , such that . We then have the following observation.
Proposition 7**.**
Assume that is not affine over , and let . Then , i.e., .
Proof.
As is concave overestimator of over and is convex underestimator of over , over . Suppose, to aim at a contradiction, that . Define a concave function , then for all , , and . By its concavity, there exists an affine overestimating function of , such that , and for all , . As , the affinity of implies that over , i.e., over . So is concave and convex over and thus affine over , which is a contradiction. ∎
The measure of the relative boundary is zero, so we can assume that a mild relative interior condition that holds with probability one. Then the relaxation point is in the relative interior of the extended envelope epigraph with probability one.
5 Hypograph-free and superlevel-free sets for SS functions
This section considers hypograph and superlevel sets for an SS function , where and are two submodular functions. This generalizes our previous results for the hypograph set of the submodular function, and thus one can generate intersection cuts for a larger family of discrete nonconvex sets.
More specifically, we consider the following nonconvex set
[TABLE]
with . Given a relaxation point , we want to find cutting planes separating this point from .
Let and be extended envelopes of , respectively. As (resp. ) is a convex extension of (resp. ), we have that By relaxing to , a (nonconvex) continuous outer approximation of is
[TABLE]
Moreover, for all , if and only if .
**Special cases. ** When , is the hypograph of the SS function ; when , is the 0-superlevel set of the SS function . Setting and , the set becomes the hypograph which is studied in the previous section. Setting , the relaxed set becomes If , since and for any , then the simple outer approximation cut is a valid inequality for (hence for ).
In general, we should separate intersection cuts specifically for SS functions. Let be a solution to (5) associated with , and we define the set
[TABLE]
The following proposition gives -free sets.
Proposition 8**.**
The set is an -free set. Moreover, if , then does not contain in its interior.
Proof.
We first prove that is -free. By definition, , which implies that . Therefore, for , we have that , which implies that . Hence, . Additionally, is convex. These two facts imply that is -free. Since , is also an -free set. Next, assume that , then , so . ∎
In [57, 67, 77], the authors study the sub/superlevel sets of some DC functions. Their construction of -free sets relies on a common reverse-linearization technique: reverse the set by changing the sign of its defining inequality, and linearize one convex function.
In our case, is an SS function, so we first need to extend the submodular and supermodular components of . After the extension, we obtain a DC function. Then, we can apply the reverse-linearization technique to its continuous extension.
6 Two applications
In this section, we discuss applications of intersection cuts to Boolean multilinear programming and D-optimal design. We exploit submodular structures in these two problems.
6.1 Boolean multilinear constraints
We consider the construction of -free sets for Boolean multilinear constraints. Since , a polynomial function defined on binary variables is equivalent to a multilinear function on binary variables. A Boolean multilinear function is sometimes called a pseudo Boolean function.
A similar case is the construction of -free sets for continuous quadratic constraints [57]. We call this construction the “continuous approach”. It applies eigenvalue decomposition to factor the symmetric matrices representing quadratic terms in a quadratic constraint. After factoring, the reformulated constraint contains a DC function. This reformulation is amenable to the reverse-linearization technique, by which one obtains the so-called continuous-quadratic-free sets [57]. Multilinear terms, however, are represented by tensors. It is doubtful whether this construction can be extended so as to produce DC functions from tensors.
Here we consider an alternative discrete approach. It exploits the submodularity and the supermodularity of Boolean multilinear functions. In [17, 60], a class of Boolean multilinear functions is shown to be supermodular. We give a submodular-supermodular decomposition for general Boolean multilinear functions in the following.
Proposition 9**.**
Given a Boolean multilinear function defined as with multilinear terms, where . Let where and . Then are submodular over .
Proof.
Given a Cartesian product set (), a function is a generalized supermodular function over , if for every , . Each multilinear term function is a Cobb-Douglas function [71], which is a generalized supermodular function over . It is known [71] that, if restricting the domain (e.g. ) to its subdomain (e.g. ) still yields a Cartesian product set, then the supermodularity is preserved. Moreover, a negative combination of supermodular functions is a submodular function. Therefore, are submodular functions over . ∎
Since every Boolean multilinear function is an SS function, we can construct -free sets for the corresponding superlevel set or hypograph set.
Corollary 5**.**
Given a multilinear function () as in Prop. 9, assume that where and . Let , , and be as (8), (9), (10), respectively. Then, the set is an -free set. Moreover, if , then does not contain in its interior.
Proof.
By Prop. 9, we know that both and are submodular. Hence, the result follows by applying Prop. 8. ∎
Importing the notation in Prop. 9, a BMP problem has the following form:
[TABLE]
where is the number of constraints, is the number of distinct multilinear terms in the BMP, is the index set of multilinear terms in the -th constraint ([math] for objective). Unconstrained BMP has several synonyms: pseudo Boolean maximization or multilinear unconstrained binary optimization (MUBO).
To construct -free sets for Boolean multilinear constraints in the BMP, we need to write them as the standard form (8). For all or , let
[TABLE]
and write
[TABLE]
where and are two submodular functions.
The objective and constraints of (11) can be represented as
[TABLE]
(for all , , and = 1), which, by Cor. 5, is in the standard form.
Separating intersection cuts requires LP relaxations or corner polyhedra. One can first lift multilinear terms to obtain an extended formulation:
[TABLE]
The standard Boolean linearization technique [29] can reformulate a multilinear term by its underestimators and overestimators:
[TABLE]
where is the cardinality of . Then, by linearizing each nonlinear constraint (12d) as linear constraints in (13), one obtains a MILP reformulation of (12).
To construct LP relaxations and corner polyhedra, one can simply drop the integrality constraints . The direct LP relaxation for the MILP reformulation is also an LP relaxation for the BMP (12). This gives us a corner polyhedron in the extended space . The -free set lives in a projected space (i.e., -space). By extracting entries of rays of the corner polyhedron, we project the corner polyhedron into the -space.
Given a corner polyhedron, it is straightforward to construct intersection cuts for the BMP: we separate intersection cuts constructed from the -free sets given by Prop. 8.
We note that Boolean quadratic constraints belong to Boolean multilinear constraints, and continuous quadratic constraints relax Boolean quadratic constraints. Both the continuous and discrete approaches can construct valid -free sets for Boolean quadratic constraints. We remark that maximal continuous-quadratic-free sets are no longer maximally Boolean-quadratic-free. It is easy to see that the discrete approach preserves the term-wise sparsity patterns of the SS functions and requires no factorizations. Therefore, the discrete approach is computationally amenable to ill-conditioned or sparse coefficient matrices.
6.2 D-optimal design
In statistical estimation, optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion. We derive an extended convex MINLP formulation for the Bayesian D-optimal design problem. In this formulation, the problem is a cardinality-constrained submodular maximization problem.
Let denote the set of -by- symmetric matrices, and let (resp. ) denote the set of -by- positive semi-definite (resp. positive definite) matrices. Given a set of full row-rank matrices , an optimal design problem usually has the following form:
[TABLE]
where is the size of the design and is the design criterion. The matrix is called the information matrix. For the D-optimal criterion [18, 64], is the log determinant function .
People usually study Bayesian D-optimal design, where a statistical prior on the parameters adds a regularization term into the information matrix . This additional term is also due to the well-posedness: when , is well defined. Then, the submodular maximization version of the Bayesian D-optimal design problem has the following formulation:
[TABLE]
The log determinant function is concave and has a semi-definite programming (SDP) and geometric programming representation [5]. The scalability of the mixed-integer log determinant formulation above is limited by the current state of SDP solvers. Based on the second order cone representation of the determinant function [64], we give an extended formulation for (15):
[TABLE]
where is an auxiliary matrix. One can represent this formulation by low-dimensional convex cones [5], e.g., (rotated) second-order cones, and exponential cones. Therefore, this extended formulation is amenable to computation.
Proposition 10**.**
(16) is equivalent to (15), and the objective of (16) is submodular w.r.t. .
Proof.
One can modify the original D-optimal design problem by adding a slack variable . Applying the logarithmic transformation to results in [64], (16) is equivalent to (15). It follows from [63, 68] that (16) is submodular w.r.t. . ∎
A global optimization solver like SCIP can linearize the constraints in the extended formulation (16), and thus produces an LP relaxation in the extended space. We can obtain a corner polyhedron as the approach dealing with the BMP. Then, we can construct intersection cuts from hypograph-free sets.
7 Separation problem
In this section, we consider the separation problem to generate an intersection cut using an -free set. Summarizing the previous sections, the -free set is in the form of
[TABLE]
where is the extended envelope of some submodular function over and . We remark that the extended envelope epigraph in (4) is a special case with and ; the set in (10) is also a special case that .
Assume that is a vertex of a corner polyhedron , and . Recalling the cut coefficient formula in Sect. 2, the separation problem is reduced to calculate the step length along each ray :
[TABLE]
This line search problem asks for the step length to the border of along the ray from the interior point . We denote by the projection of on - and - spaces. Looking at the function defining , the intersection step length is the zero point of the following function:
[TABLE]
This function enjoys the following properties.
Proposition 11**.**
* is a concave piece-wise linear function over with . If and there exists an with , then , i.e., the solution must be unique. For all , is a subgradient in . For , .*
Proof.
Since the extended envelope is the maximum of linear functions, it is convex and piece-wise linear, so is concave and piece-wise linear. Since , it follows from the assumption that and thus . Since is closed and convex, if and only if . That is , i.e., . Since , by the chain rule, is a subgradient of . By the concavity of , its subgradients are non-increasing. ∎
By Prop. 11, the line search problem (17) is reduced into solving a univariate nonlinear equation:
[TABLE]
For each ray , solving (18) gives the unique zero point of the univariate function , or certificates that no such point exists.
To solve the univariate nonlinear equation (18), it is natural to deploy a Newton-like algorithm. Therefore, we need the value and (sub)gradient information of . Moreover, the computation of can be reduced to the computation of . A sorting algorithm can compute the value and subgradients of (see Prop. 3). This means that one can compute in a strongly polynomial time.
Previous works [22, 77] use the bisection algorithm, which guarantees finding the zero point within a given tolerance. Our implementation is similar to the discrete Newton algorithm in [41], but is combined with the bisection algorithm, so we call our implementation the hybrid discrete Newton algorithm. The bisection algorithm helps find a starting point for the Newton algorithm. Thanks to the piece-wise linearity of the univariate function , our algorithm can find an exact zero point in a finite time.
Proposition 12**.**
The hybrid discrete Newton algorithm terminates in a finite number of steps and finds the zero point .
Proof.
For all , we assume that Algorithm 1 chooses and computes a unique subgradient at , we denote it , and call it algorithmic gradient. The concavity of implies that its algorithmic gradient is monotone-decreasing w.r.t. . There is a threshold such that, for all , the algorithmic gradient ; for all (called the Newton step region), the algorithmic gradient .
After a finite number of bisection steps (at most ), the algorithm enters the Newton step region , where the algorithmic gradient is always negative. Then, we prove that the algorithmic gradient at step is different from that at step , and the algorithm stays in the Newton step region. Since is piece-wise linear (the number of its distinct algorithmic gradients is finite), the algorithm must terminate in a finite number of steps.
If at step , , then the algorithm terminates at this step and finds the zero point. If at step , , then we prove that and .
First, assume, to aim at a contradiction, that . Knowing that the algorithmic gradient is monotone-decreasing, the piece-wise linearity of implies that this algorithmic gradient is constant in the range . It follows that for all , . Hence, , which leads to a contradiction.
Second, we show that . When , by the mononcity of , . When , as by assumption that , must be negative. Then, by the concavity of , . This implies that .
∎
From Prop. 12, the hybrid discrete Newton algorithm first executes bisection steps with increasing and . Then it enters into the Newton step region. After a single Newton step, becomes negative, and then monotonically increases to zero in a finite number of steps. The discrete Newton algorithm in [41] is applied to the line search problem for submodular polyhedra, which are polars of extended polymatroids. The algorithm runs in a strongly polynomial time. In our case, includes the extended polymatroid and is unbounded. The corresponding line search problem may have no solutions, and this is a usual case in intersection cut computation [21]. Therefore, Algorithm 1 needs a safeguard step, where we evaluate at an user-defined infinity. One may also prove that Algorithm 1 runs in a strongly polynomial time, but a careful analysis for the unbounded case is needed. Owing to the limitation of pages, we do not expand this topic here.
8 Computational results
In this section, we conduct computational experiments to test the proposed cuts.
Setup and performance metrics. The experiments are conducted on a server with Intel Xeon W-2245 CPU @ 3.90GHz and 126GB main memory. We use SCIP 8.0 [14] as a MINLP framework to solve the natural formulations of test problems. SCIP is equipped with CPLEX 22.1 as an LP solver, and IPOPT 3.14 as an NLP solver.
By Thm. 2, the simple lifted split is a maximal -free set, where the splitting variable is chosen as the most fractional entry of the relaxation solution. In the standalone (resp. the embedded) configuration, we deactivate (resp. activate) SCIP’s internal cut separators. Under each configuration, the submodular cut (resp. the split cut) setting adds intersection cuts derived from (resp. ), and the default setting does not add any intersection cuts.
We focus on the root node performance and measure the closed root gap. Let be the value of the first LP relaxation (without cuts added), let be the dual bound after all the cuts are added, and let be a reference primal bound. The closed root gap is the closed gap improvement of with respect to . We also record the number of added cuts, the relative improvement to the default setting, and the total running time. For each configuration and setting, we compute these statistics’ shifted geometric mean (shift value: 1) within a test problem benchmark.
Experiment 1: max cut. Consider an undirected graph , where is the set of nodes, is the set of edges, and is a weight function over . For a subset of , its associated cut capacity is the sum of the weights of edges with one adjacent node in and the other in . The max cut problem aims at finding a subset with maximum cut capacity. Using a binary variable vector indicating whether vertices belong to , then the problem can be formulated as the following quadratic unconstrained binary optimization (QUBO) problem:
[TABLE]
When is nonnegative, the cut capacity function (the objective function) is submodular. Our benchmark contains 30 “g05” and 30 “pw” instances with nonnegative weights from Biq Mac [76]. The reference primal bounds are also from Biq Mac. The number of vertices is up to 100, and the number of edges is up to 4455. We encode the hypograph reformulation (1) of the QUBO. SCIP will automatically reformulate the problem into a MILP via the reformulation-linearization technique (RLT) [1]. This MILP formulation is a special case of the extended formulation (16) of a degree-2 BMP with .
In Table 1, we report the computational results, where “closed” denotes the average closed root gap, and “relative” denotes its relative value to the default setting. For the standalone (resp. the embedded) configuration, the relative improvement of submodular cuts is (resp. ) compared to (resp. ) of split cuts. In the standalone configuration, we can compare the “clean” strengths of intersection cuts derived from different hypograph-free sets. Although split cuts are derived from maximal hypograph-free sets and submodular cuts are derived from non-maximal ones, the performance of split cuts is worse.
We observe that fewer split cuts are generated than submodular cuts. This means that the efficiency of some split cuts does not satisfy SCIP’s internal criteria, so SCIP abandons more split cuts than submodular cuts. As two types of cuts are derived using the same principle but from different hypograph-free sets, the distances between the relaxation points to the boundary of hypograph-free sets determine the cut efficiency. This observation suggests that relaxation points are further to the boundary of the extended envelope epigraph than to the splits. Under the embedded configuration, the difference in relative improvements between the two types of cuts is , so they perform almost equally. However, the separation time of split cuts is shorter than that of submodular cuts. This is because separating submodular cuts requires solving nonlinear equations, while the split cuts can be computed in a closed form.
Experiment 2: pseudo Boolean maximization. As mentioned, pseudo Boolean maximization is a MUBO problem, a generalization of QUBO. We can use techniques from Sect. 5 to generate intersection cuts.
Our benchmark contains 44 highly dense “autocorr_bern” MUBO instances from MINLPLib [19, 75]. These instances arise in theoretical physics, and the problem is to minimize a degree-four polynomial energy function. The problem is a degree-4 BMP with . SCIP constructs the extended formulation (16). The benchmark contains instances with up to 60 binary variables and 3540 Boolean multilinear terms. We use the best-known primal bound from MINLPLib as the reference primal bound.
In Table 2, we report the computational results. For the standalone (resp. the embedded) configuration, the relative improvement of submodular cuts is (resp. ) compared to (resp. ) of split cuts. In both configurations, the submodular cuts are better than the split cuts in terms of the closed root gap. Moreover, under the embedded configuration, the difference in the relative improvements between the two types of cuts is around . This is larger than of max cut benchmark under the same configuration. This divergence between degree-2 and degree-4 MUBO suggests that the submodular cuts are suitable for high-degree Boolean multilinear constraints.
We recall that to solve the nonlinear equations, the hybrid discrete Newton algorithm needs oracle access to the value of the Boolean multilinear function. For some instances, a Boolean multilinear function may consist of thousands of multilinear terms. After a code timing analysis, we find that the separation of submodular cuts spends the most time computing the function value. Therefore, this is the main time performance bottleneck, which needs to be optimized in the future. An counterintuitive finding is that non-maximal -free sets may yield stronger cuts. This because the geometrical relation between the -free sets and corner polyhedron matters.
Experiment 3: Bayesian D-optimal design. As mentioned, the Bayesian D-optimal design problem has a submodular maximization form (15). In particular, we can encode it as an extended formulation (16) in SCIP. SCIP generates gradient cuts for this convex MINLP. Therefore, we can obtain LP relaxations and corner polyhedra.
Our benchmark consists of two classes of instances. We let parameters be single-column matrices. The first class of instances are block design problems [64], where are sparse 0-1 matrices. The exact designs correspond to the graphs with a given number of edges and nodes that have a maximum number of spanning trees. Recall that is the variable dimension, is the matrix dimension, and is the cardinality. We generate 15 block design instances with . The second class of instances are random Gaussian instances, where are dense real matrices. The entries of matrices are drawn from a Gaussian distribution with zero mean and variance. We generate 30 random Gaussian instances with and . We set the regularization constant to . We use the best primal bound from all settings as the reference primal bound. Since SCIP’s internal gradient cuts are important for linearizing convex nonlinear constraints, we keep the gradient cuts but disable all integer-oriented cuts (GMI cuts and mixed-integer rounding cuts etc.) in the standalone configuration.
In Table 3, we report the computational results. We divide the results of block design and Gaussian random instances, since the density of matrices are different. Looking at the default setting in different benchmarks, there is no difference between the standalone and embedded configurations in terms of the closed root gap. This means that integer-oriented cuts do not improve the root node LP relaxations. We see the same problem for intersection cuts, which do not close the root gap but increase the computing time. In particular, the number of separated cuts is around one. Thereby, many intersection cuts are too weak to add in the cut pool.
We recall that intersection cuts and many integer-oriented cuts are LP-based cuts, i.e., derived from an LP relaxation of the extended formulation (16). Therefore, their strengths depend on the LP relaxation. A flat corner polyhedron, which usually arises from an LP relaxation with many constraints, may yield weak intersection cuts. Based on types of MINLPs, there are two basic ways to construct initial LP relaxations. For nonconvex MINLPs, one way usually uses the factorable programming and term-wise envelopes [53]. Notable examples are Boolean multilinear constraints and continuous quadratic constraints [57]. The McCormick envelopes or Boolean linearization techniques are used to construct their LP relaxations, which have a finite number of constraints. For convex MINLPs, the other way linearizes nonlinear constraints, and the number of constraints in the LP relaxation can grow to infinite. This is because a convex nonlinear constraint is equivalent to an infinite number of linear constraints. Since SCIP may add many gradient cuts for approximating the convex MINLP (16), this yields flat corner polyhedrons and weak intersection cuts. In summary, the weakness of intersection cuts is due to the flatness of the corner polyhedron.
9 Conclusion
We construct hypograph-free sets for submodular functions. Our construction relies on a new continuous extension of submodular functions. We characterize maximal hypograph-free sets, generalize our results to sets involving submodular-supermodular functions. These yield intersection cuts for Boolean multilinear constraints. We exploit the submodular structure in an extended formulation of the D-optimal design problem. We propose a hybrid discrete Newton algorithm that can compute intersection cuts efficiently and exactly. The computational results show that intersection cuts derived from the submodularity are stronger than those derived from split cuts for max cut and pseudo Boolean maximization problems. For convex MINLPs, our computational results on the Bayesian D-optimal design problem suggest that corner polyhedra can be flat, which makes intersection cuts weak.
Statements and Declarations
Non conflicts of interest with the journal or the funding agencies.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Warren P Adams and Hanif D Sherali. A tight linearization and an algorithm for zero-one quadratic programming problems. Management Science , 32(10):1274–1290, 1986.
- 2[2] Shabbir Ahmed and Alper Atamtürk. Maximizing a class of submodular utility functions. Mathematical programming , 128(1):149–169, 2011.
- 3[3] Kent Andersen, Quentin Louveaux, and Robert Weismantel. An analysis of mixed integer linear sets based on lattice point free convex sets. Mathematics of Operations Research , 35(1):233–256, feb 2010.
- 4[4] Kent Andersen, Quentin Louveaux, Robert Weismantel, and Laurence A. Wolsey. Inequalities from two rows of a simplex tableau. In Matteo Fischetti and David P. Williamson, editors, Integer Programming and Combinatorial Optimization , pages 1–15, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
- 5[5] Mosek Ap S. Mosek modeling cookbook, 2020.
- 6[6] Alper Atamtürk and Andrés Gómez. Submodularity in conic quadratic mixed 0–1 optimization. Operations Research , 68(2):609–630, 2020.
- 7[7] Alper Atamtürk and Andrés Gómez. Supermodularity and valid inequalities for quadratic optimization with indicators. Mathematical Programming , pages 1–44, 2022.
- 8[8] Alper Atamtürk and Vishnu Narayanan. Submodular function minimization and polarity. Mathematical Programming , 2021.
