Foundations of gauge and perspective duality
Alexandre Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P., Friedlander, Kellie MacPhee

TL;DR
This paper revisits gauge duality, establishing a modern, unified framework with Fenchel-Rockafellar duality, and extends it to general nonnegative convex functions, enhancing understanding and applicability in convex optimization.
Contribution
It provides a modern, unified explanation of gauge duality using a perturbation framework and extends the theory to broader classes of convex functions and models.
Findings
Gauge duality can be explained via a perturbation-based duality approach.
Primal solutions can be recovered from gauge dual solutions through rescaling.
The framework applies to general nonnegative convex functions, including piecewise linear quadratic functions.
Abstract
We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Foundations of gauge and perspective duality111June 18, 2018
A.Y. Aravkin Department of Applied Mathematics, University of Washington, Seattle ([email protected]). Research supported by the Washington Research Foundation Data Science Professorship.
J.V. Burke Seattle, WA ([email protected]). Research supported in part by NSF award DMS-1514559.
D. Drusvyatskiy Department of Mathematics, University of Washington, Seattle ([email protected]; [email protected]). Research partially supported by AFOSR YIP award FA9550-15-1-0237.
M.P. Friedlander Departments of Computer Science and Mathematics, University of British Columbia, Vancouver, BC, Canada ([email protected]). Research supported by ONR award N00014-16-1-2242.
K.J. MacPhee*§*
Abstract
We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
keywords:
convex optimization, gauge duality, nonsmooth optimization, perspective function
{AMS}
90C15, 90C25
1 Introduction
Sensitivity of the optimal values and solutions of optimization problems, with respect to perturbations in the problem data, is a central concern of Fenchel-Rockafellar duality theory. Lagrange duality can be regarded as a special case of this theory, in which perturbations to the data are introduced in a particular manner. Gauge duality, on the other hand, as introduced in 1987 by Freund [13], was developed without any reference to sensitivity. It relies instead on a special polarity correspondence that exists for nonnegative, positively homogeneous convex functions that vanish at the origin; these are known as gauge functions. In 2014, Friedlander, Macêdo, and Pong [15] made partial progress towards connecting gauge and Lagrange dualities. In the present work, we show that gauge duality may be regarded as a particular application of Fenchel-Rockafellar duality theory that is different than the one required for Lagrange duality. This connection provides a useful vantage point from which to develop new algorithms for an important class of convex optimization problems. We also describe how gauge duality theory can be extended beyond the optimization of gauge functions to the optimization of all convex functions that are bounded below. We call this extension perspective duality.
A convenient and fully general formulation for our approach is the problem
[TABLE]
where is a linear map, is an -vector, and and are closed gauge functions. For many applications, the function is used to regularize the problem in order to obtain solutions with certain desirable properties. For example, in statistical and machine-learning applications the regularizer is often a nonsmooth, structure-inducing function; e.g. the -norm, which is frequently used to encourage sparsity in the solution. The function may be regarded as a penalty function, such as the 2-norm, that measures the degree of misfit between the data and the linear model , and may reflect a statistical model of the noise in the data . The perspective duality extension enables us to consider optimization problems with a wider range of applications by allowing functions and that are not positively homogenous, including the Huber function used for robust regression [17], the elastic net used for group detection [28], and the logistic loss used for classification [18, 1].
The formulation Eq. Gp gives rise to two different “dual” problems:
[TABLE]
Here and are the polars of and , which are also gauge functions; see Section 2.1 for a precise definition. In the important case , we interpret as the indicator function of the closure of the domain of (see the discussion in LABEL:sect:assumptions). The first problem Eq. Ld is the standard Lagrangian (or Fenchel-Rockafellar) dual, which is the dual problem typically considered in connection with convex optimization problems. Strong duality, reflected in the equality
[TABLE]
and in the attainment of the optimal value of the Lagrange primal-dual pair, holds under mild interiority conditions often referred to as the Slater constraint qualification. The second problem Eq. Gd is the gauge dual and is less well-known. Under interiority conditions similar to those required by Lagrange duality, strong duality holds in the gauge duality setting; this is reflected in the analogous equality
[TABLE]
and in the attainment of the optimal value of the gauge primal-dual pair.
In certain contexts, the gauge dual Eq. Gd can be preferable for computation to the the primal Eq. Gp and the Lagrangian dual Eq. Ld, particularly when the polar has a special structure. Friedlander and Macêdo [14], for example, use gauge duality to derive an effective algorithm for an important class of low-rank spectral optimization problems that arise in signal-recovery applications, including phase recovery and blind deconvolution. Indeed, the effectiveness of numerous convex optimization algorithms—particularly first-order methods—relies on being able to project easily onto the constraint set. The appearance of the linear map in the constraints of both Eq. Gp and Eq. Ld means that such methods may not be efficient, though some recent methods have been proposed that circumvent this difficulty [24]. In contrast, the map appears in the gauge dual Eq. Gd only in the objective, and computing subgradients of this objective only requires subgradients of , together with the ability to efficiently implement matrix-vector multiplication. Moreover, typical applications occur in the regime . For example, is often logarithmic in [7, 6, 26, 11]. Because the dual variables of Eq. Gd lie in the much smaller space , projections onto the feasible region may be computed efficiently, depending on the context. An example of how an interior method may be used for this purpose is given in Section 5.2.
1.1 Approach
This paper has two main goals. The first goal, addressed in Section 3, is to show how the foundations of gauge duality can be derived via a perturbation framework pioneered by Rockafellar[21, 20], in which the optimal value and optimal solution depend on parameters to the problem. We follow Rockafellar and Wets [23, 11.H], who consider an arbitrary convex perturbation function on that determines how the parameters enter the problem, and define the value functions
[TABLE]
This set-up immediately yields the primal-dual pair
[TABLE]
Fenchel-Rockafellar duality theory flows from an appropriate choice of . We show that gauge duality fits equally well into this framework under a judicious choice of the perturbation function , thereby putting Fenchel-Rockafellar and gauge duality theories on an equal footing. Strong duality, primal-dual optimality conditions, and an interpretation of the gauge dual solutions as sensitivity measures—i.e., subgradients of the value function—quickly follow; cf. Section 3.2. These results, in particular, answer an open question posed by Freund in his original work [13], which asked for an interpretation of gauge dual variables for problems with nonlinear constraints. It also completes a partial analysis by Friedlander et al. [15] on the interpretation of gauge dual variables as sensitivity measures.
This viewpoint allows us to prove a striking relationship between optimal solutions of the primal and optimal solutions of the Lagrangian dual of the gauge dual: the two coincide up to scaling by the optimal value (Section 3.5). Consequently, Lagrangian primal-dual methods applied to the gauge dual can be used to recover solutions of the original primal problem. We illustrate this idea in Section 7 with an application of Chambolle and Pock’s primal-dual algorithm [8] to a specific problem instance.
The second goal of this paper is to extend the applicability of the gauge duality paradigm beyond gauges to capture more general convex problems. Section 4 extends gauge duality to problems involving convex functions that are merely nonnegative, and by an appropriate translation, functions that are bounded from below. The approach is based on using the perspective transform of a convex function [20, p. 35], which increases a function’s domain from to and makes it positively homogeneous, enabling the property that is key to the application of gauge duality. We term the resulting dual problem the perspective dual. The perspective-polar transformation, needed to derive the perspective dual problem, is developed in Section 4. Concrete illustrations of perspective duality for the family of piecewise linear-quadratic functions, which are often used in data-fitting applications, and for the setting of generalized linear models, are given in Section 5. We further explore examples of optimality conditions and primal-from-dual recovery in Section 6. Numerical illustrations for a case-study of perspective duals comprise Section 7.
2 Notation and assumptions
The derivation of our results relies on standard notions from convex analysis. Unless otherwise specified, we generally follow Rockafellar [20] for standard definitions and notation, including domains and epigraphs, relative interiors, convex conjugate functions, subdifferentials, polar sets, etc. In this section we collect less well-known definitions and notation used throughout the paper, and establish blanket assumptions on the problem data.
Let denote the extended real line, and denote the nonnegative extended reals. Let and denote general closed convex functions. For a closed convex set , its convex indicator is the closed convex function whose value is zero on and otherwise. Let denote the cone generated by . We often abbreviate fractions such as to .
2.1 The perspective transform
For any convex function , its perspective is the function on whose epigraph is the cone generated by the set . Because this transform is not necessarily closed—even when is closed—we choose to work with its closure, and redefine the transform as
[TABLE]
where is the recession function of [20, Theorem 8.5]. A calculus for the perspective transform is described by Aravkin, Burke, and Friedlander [2, Section 3.3] and, for the infinite-dimensional case, by Combettes [10, 9], where properties of the perspective transform are described in detail. We often apply more than one transformation to a function, and in such cases, the multiple transformations are applied in the order that they appear; e.g., .
2.2 Gauge functions
The following is only a brief description of gauge functions. A complete description is given by Rockafellar [20, Section 15].
A convex function is called a gauge if it is nonnegative, positively homogeneous, and vanishes at the origin. The symbols and will always denote closed gauges. The polar of a gauge is the function defined by
[TABLE]
which is also a gauge and satisfies when is closed [20, Theorem 15.1]. For example, if is a norm then is the corresponding dual norm. Note the identity
[TABLE]
It follows directly from (2.2) and positive homogeneity of a gauge function that its polar can be characterized as the support function to the unit level set, i.e.,
[TABLE]
Moreover, and satisfy a Hölder-like inequality
[TABLE]
which we refer to as the polar-gauge inequality. The zero level set
[TABLE]
plays a key role when . It is straightforward to show that
[TABLE]
whenever is closed, where is the recession cone for [20, Section 8]. We include proofs of (2.6) in Appendix A.
2.3 Assumptions on the feasible region
Define the following primal and dual feasible sets:
[TABLE]
The nonnegativity of implies that the Slater condition can fail when , and thus special attention is required. In this case, we make the replacement
[TABLE]
This replacement yields a gauge optimization problem whose solution set and optimal value coincide with those of (Gp). Observe that because is a closed convex cone, is a closed gauge that satisfies, by virtue of (2.6), . This motivates the convention made immediately following (Gd) that
[TABLE]
The replacement (2.8) allows us to make the useful assumption that , which significantly streamlines our analysis. The convention (2.9) also makes sense from an epigraphical perspective, because the functions epigraphically converge to as [23, Proposition 7.4(c)].
The gauge primal (Gp) and dual (Gd) problems are said to be feasible, respectively, if the following intersections are nonempty:
[TABLE]
Similarly, the primal and dual problems are said to be relatively strictly feasible, respectively, if the following intersections are nonempty:
[TABLE]
If the intersections above are nonempty, with interior replacing relative interior, then we say that the problems are strictly feasible. We have
[TABLE]
which follows from Rockafellar [20, Theorem 7.6] when , and from the convention (2.9) when .
We assume throughout that . Otherwise, contains the origin, which is a trivial solution of (Gp). This assumption is consistent with classical applications in signal processing and machine learning, where the corresponding assumption is that the data does not entirely consist of noise.
3 Perturbation analysis for gauge duality
Modern treatment of duality in convex optimization is based on an interpretation of multipliers as giving sensitivity information relative to perturbations in the problem data. No such analysis, however, has existed for gauge duality. In this section we show that for a particular kind of perturbation, the gauge dual (Gd) can in fact be derived via such an approach.
3.1 General perturbation framework
Our analysis is based on a perturbation theory described by Rockafellar and Wets [23, 11.H]. In this section we summarize the main results from [23] that we need. Fix an arbitrary convex function , and consider the value functions defined by (1.1)–(1.2). Observe the equality . For example, Fenchel-Rockafellar duality for the problem
[TABLE]
is obtained from the general perturbation theory by setting . In that case, the primal-dual pair takes the familiar form
[TABLE]
Under certain conditions, described in the following theorem, strong duality holds, i.e. , and the optimal values are attained.
Theorem 3.1** (Multipliers and sensitivity [23, Theorem 11.39]).**
Consider the primal-dual pair (1.2), where is proper, closed, and convex.
- (a)
The inequality always holds. 2. (b)
If , then equality holds and, if finite, the infimum is attained with . Similarly, if , then equality holds and, if finite, the infimum is attained with . 3. (c)
The set is nonempty and bounded if and only if is finite and . 4. (d)
The set is nonempty and bounded if and only if is finite and . 5. (e)
Optimal solutions are characterized jointly through the conditions
[TABLE]
Proof 3.2**.**
The only difference between the statement of this theorem and that in [23, Theorem 11.39] is in part (b). Here, we make use of the relative interior rather than the interior. Thus, we only prove part (b). Suppose . If , then follows by Part (a). Hence we can assume that is finite, and conclude that is proper. By [20, Theorem 23.4], , and given ,
[TABLE]
By taking the infimum over and recognizing the right-hand side as , we deduce that . Combining this with Part (a) yields . Hence . Conversely, given any , we have
[TABLE]
*and so . The case follows by an analogous argument. *
3.2 A perturbation for gauge duality
We now show that the problems Eq. Gp and Eq. Gd constitute a primal-dual pair under the framework set out by Theorem 3.1. The key is to postulate the correct pairing function . In the derivation below, we show that the gauge primal-dual pair corresponds to the primal and dual value functions
[TABLE]
where, as in (Gd), we use the convention described by (2.8) and (2.9). The parameters and are perturbations to the primal and dual gauge problems, respectively. This perturbation scheme differs significantly from that used in Fenchel-Rockafellar duality—cf. (3.1)—because of the product .
We begin by observing that is equal to the optimal value of the primal Eq. Gp. Because and appear as a product in this definition, it is convenient to reparametrize the problem by setting and . The positive homogeneity of and allows us to equivalently phrase the primal value function as
[TABLE]
In particular, this reparameterization shows that the value function is convex because it is the infimal projection of a convex function, and it is proper when the primal Eq. Gp is feasible.
We now construct the function appearing in Theorem 3.1 associated with this duality framework. In this construction, we assume that , possibly making the replacement (2.8) if . Note that minimizing is equivalent to minimizing for . Define the convex function by
[TABLE]
Observe that the matrix is nonsingular.
Because , and and are closed, the function is closed and proper. This pairing function gives rise to the infimal projection problems
[TABLE]
which correspond to the general definitions shown in (1.1). Note that the function is the reciprocal of , as formalized in the following lemma (stated without proof).
Lemma 3.3**.**
*Equality holds provided that is nonzero and finite. Moreover, if and only if , and if and only if . *
We now compute the conjugate of , which is needed to derive the dual value function . By Rockafellar and Wets [23, Theorem 11.23(b)],
[TABLE]
where the closure operation is applied to the function on the right-hand side with respect to the argument . Using the definition of , the constraint in the description of is precisely , and the unique vector that satisfies these constraints is . The closure operation is therefore superfluous, and we obtain
[TABLE]
Since and by (2.3) and (2.4), this reduces to
[TABLE]
The application of Theorem 3.1 asks that we evaluate these conjugates at , which yields the expression
[TABLE]
Thus, the dual problem
[TABLE]
recovers, up to a sign change, the required gauge dual problem (Gd) when . When , we also recover the gauge dual problem (Gd) by making the appropriate substitutions (2.8) under the convention (2.9).
This discussion justifies the definition of the dual perturbation function , which is equivalent to the expression (3.2b). Note that is the optimal value of (Gd). In summary, and , respectively, play the roles of and as defined in (3.3). In the application of Theorem 3.1, we identify with , and with .
3.3 Proof of gauge duality
We now use the perturbation framework from Section 3.2 to prove weak and strong duality results for the gauge duality setting. Theorem 3.5 [15, section 5] is already known, but the proof via perturbation is new.
The following auxiliary result ties the feasibility of the gauge pair (Gp) and (Gd) to the domain of the value function. The proof of this result, which is largely an application of the calculus of relative interiors, is deferred to Appendix B.
Lemma 3.4** (Feasibility and domain of the value function).**
*If the primal (Gp) is relatively strictly feasible, then . If the dual (Gd) is relatively strictly feasible, then . The analogous implications, where the operator is replaced by the operator, hold under strict feasibility (not relative). *
The duality relations in the gauge framework follow analogous principles to Lagrange duality, except that instead of an additive relationship between the primal and dual optimal values the relationship is multiplicative. The following theorem summarizes weak and strong duality for gauge optimization.
Theorem 3.5** (Gauge duality [15]).**
Set and . Then the following relationships hold for the gauge primal-dual pair (Gp) and (Gd).
- (a)
(Basic Inequalities) It is always the case that
[TABLE]
In particular, if (resp. ), then (**Gd***) (resp. (Gp)) is infeasible.* 2. (b)
(Weak duality) If and are primal and dual feasible, then
[TABLE] 3. (c)
(Strong duality) If the dual (resp. primal) is feasible and the primal (resp. dual) is relatively strictly feasible, then and the gauge dual (resp. primal) attains its optimal value.
Proof 3.6**.**
To simplify notation, in this proof we denote the optimal value of the primal value function by .
Part (a). We begin with the inequality (i). Theorem 3.1 guarantees the inequality
[TABLE]
By Lemma 3.3, whenever is nonzero and finite, equality holds, which together with (3.4) yields (i). If, on the other hand, , then (i) is trivial. Finally, if , Lemma 3.3 yields , and hence (3.4) implies , and (i) again holds. Thus, (i) holds always. To establish (ii), it suffices to consider the case . From (3.4) we conclude , that is either or . By Lemma 3.3, the first case implies and therefore (ii) holds. The second case implies that the primal problem is infeasible, that is , and again (ii) holds. Thus (ii) holds always, as required.
Part (b). Because the gauge primal and dual problems are both feasible, and are nonzero and finite so the result follows from part (a).
Part (c). Suppose the dual is feasible and the primal is relatively strictly feasible. In particular, both and are nonzero and finite by part (a). Hence . On the other hand, by Lemma 3.4 the assumption that the primal is relatively strictly feasible implies . This last inequality implies is finite, and hence is proper. Theorem 3.1(b) tells us that and the infimum in the dual is attained. Thus we deduce , as claimed.
*Conversely, suppose that the primal is feasible and the dual is relatively strictly feasible. Then, by Lemma 3.4, . This in turn implies and that the infimum in is attained. Since the primal is feasible, by Lemma 3.3, is nonzero, and hence and the infimum in the primal is attained. *
3.4 Gauge optimality conditions
Our perturbation framework can be harnessed to develop optimality conditions for the gauge pair that relate the primal-dual solutions to subgradients of the corresponding value function. This yields a version of parts (b) and (d) in Theorem 3.1 that are specialized to gauge duality.
Theorem 3.7** (Gauge multipliers and
sensitivity).**
The following relationships hold for the gauge primal-dual pair (Gp) and (Gd).
- (a)
If the primal is relatively strictly feasible and the dual is feasible, then the set of optimal solutions for the dual is nonempty and coincides with
[TABLE]
If it is further assumed that the primal is strictly feasible, then the set of optimal solutions to the dual is bounded. 2. (b)
If the dual is relatively strictly feasible and the primal is feasible, then the set of optimal solutions for the primal is nonempty with solutions , where
[TABLE]
If it is further assumed that the dual is strictly feasible, then the set of optimal solutions to the primal is bounded.
Proof 3.8**.**
Part (a). Because (Gp) is relatively strictly feasible, it follows from Lemma 3.4 that , and because the dual is feasible, is finite. Theorem 3.1 and Lemma 3.3 then imply the conclusion of Part (a). The statement on the boundedness of the set of the optimal solutions to the dual follows from Theorem 3.1.
*Part (b). Because (Gd) is relatively strictly feasible, it follows from Lemma 3.4 that , and because the primal is feasible, is finite. Theorem 3.1 then implies that the optimal primal set is nonempty, and . Because the primal and dual problems are feasible, any pair must satisfy by Theorem 3.5 and Lemma 3.3. Thus, this inclusion is equivalent to being optimal for the primal problem, with optimal value . This proves Part (b). The statement on the boundedness of the set of the optimal solutions to the primal again follows from Theorem 3.1. *
We use the sensitivity interpretation given by Theorem 3.7 to develop a set of necessary and sufficient optimality conditions that mirror the more familiar KKT conditions from Lagrange duality. For a primal-dual optimal pair , the condition characterizes a degenerate case when because in that case the primal constraint is inactive at (i.e., ). On the other hand, the dual constraint is always active at optimality because the positive homogeneity of the dual objective and the dual constraint imply . The full primal-dual optimality conditions for gauge duality are described in the following theorem.
Theorem 3.9** (Optimality conditions).**
Suppose both problems of the gauge dual pair Eq. Gp and Eq. Gd are relatively strictly feasible, and the pair is primal-dual feasible. Then is primal-dual optimal if and only if it satisfies the conditions
[TABLE]
Proof 3.10**.**
First suppose that satisfies (3.5a)-(3.5d). By Theorem 3.5, to show that is primal-dual optimal it is sufficient to show that . Add (3.5c) and (3.5d) to obtain
[TABLE]
By combining the above with (3.5b) we obtain , as desired.
Suppose now that is primal-dual optimal. We begin by assuming that and obtain the case by applying the result for the case under the replacement (2.8). By the positive homogeneity of and the optimality of , (3.5b) holds. Also note that and are both nonzero and finite because of the strong duality guaranteed by Theorem 3.5.
Define and so that . By Theorem 3.1(e) and Theorem 3.7(b), we must have . Since the primal problem is relatively strictly feasible, we can apply [20, Theorem 23.9] to deduce the characterization
[TABLE]
where denotes the normal cone to a set . We now consider two cases. First, suppose Then (3.5a) holds, and by straightforward computations involving only (2.4) and the definitions of normal cones and subdifferentials, we have
[TABLE]
where
[TABLE]
and . Substitute these formulas into (3.6) to obtain
[TABLE]
We deduce the existence of and such that
[TABLE]
Note that cannot satisfy (3.7b), hence (3.7c), together with the polar-gauge inequality and the fact that , implies
[TABLE]
Equality must hold in the above, and dividing through by we see that (3.5c) is satisfied. Finally, we aim to show that (3.5d) holds using the fact that . From the characterization (2.4) of the polar, we have
[TABLE]
In particular, this characterization implies If , then by the polar-gauge inequality (2.5) we have
[TABLE]
which gives condition (3.5d) after dividing through by . On the other hand, if then the set (3.8) is given by . Thus when , we again have , and multiplying through by and applying (3.5a) gives (3.5d).
We have shown the forward implication of the theorem when The other case we need to consider is when or equivalently when . An easy argument (e.g., see [12, Proposition 2.14(iv)]) shows
[TABLE]
Similar to the first case, we now have
[TABLE]
We deduce that and also that and . Again, because , the polar-gauge inequality implies (3.5c) holds.
We now show that and , which, if true, establishes (3.5a) and (3.5d) are satisfied as well. First note that implies , which implies by Eq. 2.6 . Thus, by (3.5b), (3.5c), and the fact that from Theorem 3.5, we have
[TABLE]
Thus if is primal-dual optimal, then (3.5a)-(3.5d) hold, as claimed. This finishes the proof for .
Let us now consider the case when and apply what we have just proved to the pair (Gp) and (Gd) under the replacement (2.8). Then is primal-dual optimal if and only if the conditions (3.5a)-(3.5d) hold with , i.e.,
[TABLE]
*If we combine this with primal feasibility, , and use the identity (2.9) that , then these conditions are equivalent to (3.5a)-(3.5d) for , , and as written above. *
The following corollary describes a variation of the optimality conditions outlined by Theorem 3.9. These conditions assume that a solution of the dual problem is available, and gives conditions that can be used to determine a corresponding solution of the primal problem. An application of the following result appears in LABEL:sect:recovery_ex.
Corollary 3.11** (Gauge primal-dual recovery).**
Suppose that the primal-dual pair (Gp) and (Gd) are each relatively strictly feasible. If is optimal for (Gd), then for any primal feasible the following conditions are equivalent:
- (a)
* is optimal for (Gp);* 2. (b)
* and ;* 3. (c)
* and ,*
*where, by convention, when . *
Proof 3.12**.**
We use the optimality conditions given in Theorem 3.9. As noted before, by the optimality of we automatically have equality (3.5b) in the dual constraint.
We first show that (b) implies (a). Suppose (b) holds. Then (3.5c) holds automatically. From the characterization (2.4) of the polar, we have
[TABLE]
where the case uses the convention (2.9). Thus, is the set of maximizing elements in this supremum. Because , it holds that . If we additionally use the polar-gauge inequality, we deduce that
[TABLE]
and therefore the above inequalities are all tight. Thus conditions (3.5a) and (3.5d) hold, and by Theorem 3.9, is a primal-dual optimal pair.
We next show that (a) implies (b). Suppose that is optimal for (Gp). Then the first condition of (b) holds by (3.5c), and (3.5a) and (3.5d) combine to give us
[TABLE]
This implies that is a maximizing element of the supremum in (3.9), and thus
*Finally, to show the equivalence of (b) and (c), note that by the polar-gauge inequality, if and only if minimizes the convex function This, in turn, is true if and only if , or equivalently, . *
3.5 The relationship between Lagrange and gauge multipliers
We now use the perturbation framework for duality to establish a relationship between gauge dual and Lagrange dual variables. We begin with an auxiliary result that characterizes the subdifferential of the perspective function (2.1). Combettes [10, Prop. 2.3(v)] also describes an equivalent formula for the subdifferential, though the derivation and subsequent form of the expression are very different. The formula in Lemma 3.13 is more suitable for our purposes.
Lemma 3.13** (Subdifferential of perspective
function).**
Let be a closed proper convex function. Then for , equality holds:
[TABLE]
Proof 3.14**.**
Recall that the subdifferential of the support function to any nonempty closed convex set is given by [20, Theorem 23.5 and Corollary 23.5.3]. By [20, Corollary 13.5.1], , where is a closed convex set. If , then is nonempty and
[TABLE]
Suppose now that . Then
[TABLE]
Using the expression for the subdifferential of a support function, achieves the supremum of (3.10) if and . On the other hand, if then
[TABLE]
*Again using the expression for the subdifferential of a support function, achieves the supremum of (3.10) if and only if and . *
We now state the main result relating the optimal solutions of (Gp) to the optimal solutions of the Lagrange dual of (Gd).
Theorem 3.15**.**
Suppose that the gauge dual (Gd) is relatively strictly feasible and the primal (Gp) is feasible. Let denote the Lagrange dual of (Gd), and let denote its optimal value. Then
[TABLE]
Proof 3.16**.**
We first note that can be derived via the framework of Theorem 3.1 through the Lagrangian value function
[TABLE]
Here plays the role of in Theorem 3.1; cf. [23, Example 11.41]. Strong duality in Theorem 3.5 guarantees that is nonzero and finite, and by Lemma 3.4,
[TABLE]
Thus, it follows from Theorem 3.1 that the optimal points for are characterized by . Note also that .
On the other hand, by Theorem 3.7(b) the solutions to (Gp) are precisely the points such that . Thus to relate the solution sets of and (Gp), we must relate and .
For in a neighborhood of zero and all , by positive homogeneity of and we have
[TABLE]
Thus by Lemma 3.13, However, for the Fenchel-Young equality gives us
[TABLE]
Thus we obtain the convenient description
[TABLE]
*and the set of optimal solutions for (Gp) is precisely . *
4 Perspective duality
We now move on to an extension of the gauge duality framework, which allows us to consider functions that are not necessarily positively homogeneous, but continue to be nonnegative and convex. (The same framework applies to functions that are bounded below because these can be made nonnegative by translation.) For the remainder of the paper, consider functions and , that are closed, convex and nonnegative over their domains. In this section we derive and analyze the perspective-dual pair
[TABLE]
The functions and are the polars of the perspective transforms of and . This transform is a key operation needed to derive perspective duality. In the next section we describe properties of that transform and its application to the derivation of the perspective-dual pair. Throughout this section, we assume that .
4.1 Perspective-polar transform
Given a closed proper convex function , define the perspective-polar transform by .
An explicit characterization of the perspective-polar transform is given by
[TABLE]
This representation can be obtained by applying the definition of the gauge polar (2.2) to the perspective transform as follows:
[TABLE]
which yields (4.1) after dividing through by . Rockafellar’s extension [20, p.136] of the polar gauge transform to nonnegative convex functions that vanish at the origin coincides with .
The following theorem provides an alternative characterization of the perspective-polar transform in terms of the more familiar Fenchel conjugate . It also provides an expression for the perspective-polar of in terms of the Minkowski function generated by the epigraph of the conjugate of , i.e.,
[TABLE]
which is a gauge. Nonnegativity of is not required for the first part of this result.
Theorem 4.1**.**
*For any closed proper convex function with , we have If, in addition, is nonnegative, f^{\sharp}(z,-\xi)=\gamma_{\mbox{\scriptsize\operatorname{\mathrm{epi}}f^{\star}}}(z,\xi). *
Proof 4.2**.**
Because of the assumptions on , we have for each [20, Corollary 8.5.2]. Thus we obtain the following chain of equalities:
[TABLE]
This proves the first statement. Now additionally suppose that is nonnegative. Because is closed, it is identical to its biconjugate, and so . Also, is closed and convex, and contains the origin because is nonnegative. Therefore, it follows from [20, Corollary 15.1.2] that
[TABLE]
The following result relates the level sets of the perspective-polar transform to the level sets of the conjugate perspective. This result is useful in deriving the constraint sets for certain perspective-dual problems for which there is no closed form for the perspective polar; cf. Example 5.5.
Theorem 4.3** (Level-set equivalence).**
Let be a nonnegative, closed proper convex function with . Then, for any ,
[TABLE]
Proof 4.4**.**
The following chain of equivalences follows from Theorem 4.1:
[TABLE]
Define .
We first show that implies and . By (4.2), . If , there exists with such that Because is nonnegative, , and thus . In particular,
[TABLE]
On the other hand, if , there exists a sequence such that for each . Now by the lower semi-continuity of , we obtain
[TABLE]
This establishes the forward implication of the theorem.
For the reverse implication, suppose and . If , it follows from (4.2) that . Now suppose otherwise that We want to show . By hypothesis, . Thus there exists a sequence with and for all . With no loss in generality, we can assume that for all . Then for each , we have for which we have the following equivalences:
[TABLE]
*which gives in the limit, since is closed. *
4.1.1 Calculus rules
Two useful calculus rules are now developed that govern the perspective-polar transform when applied to gauge functions and separable sums.
Example 4.5** (Gauge functions).**
Suppose that is a closed proper gauge. Then
[TABLE]
Use expression (4.1) for this derivation. When , take in the infimum in (4.1) to deduce that On the other hand, when , the positive homogeneity of implies that . We leave the details to the reader. More generally, if vanishes at the origin, then for all
Example 4.6** (Separable sums).**
Suppose that where each convex function is nonnegative. Then a straightforward computation shows that . Furthermore, taking into account [15, Proposition 2.4], which expresses the polar of a separable sum of gauges, we deduce
[TABLE]
4.2 Derivation of the perspective dual via lifting
We now derive the relationship between the primal and dual problems (Np) and (Nd) by lifting (Np) to an equivalent gauge optimization problem, and then recognizing (Nd) as its gauge dual.
Theorem 4.7** (Gauge lifting of the primal).**
A point is optimal for (Np) if and only if is optimal for the gauge problem
[TABLE]
*where is a gauge function. *
Proof 4.8**.**
By definition of , is optimal for (Np) if and only if the pair is optimal for
[TABLE]
The following equivalence follows from the definition of :
[TABLE]
*Thus we arrive at the constraint expressed in (4.3). *
Corollary 4.9** (Gauge dual).**
*Problem (Nd) is the gauge dual of (4.3). *
Proof 4.10**.**
It follows from the canonical dual pairing (Gp) and (Gd) that the gauge dual of (4.3) is
[TABLE]
Because is separable in and , it follows from [15, Proposition 2.4] that
[TABLE]
*Since is identically zero, the result follows. *
The next result generalizes the gauge duality result of Theorem 3.5 to the case where and are convex and nonnegative but not necessarily gauges. We parallel the construction in (2.7), and for this section only redefine the feasible sets by
[TABLE]
Thus, (Np) is relatively strictly feasible if
[TABLE]
Similarly, (Nd) is relatively strictly feasible if there exists a triple such that
[TABLE]
Strict feasibility follows the same definitions, where the operation is replaced by .
Theorem 4.11** (Perspective duality).**
Let and , respectively, denote the optimal values of the pair (Np) and (Nd). Then the following relationships hold for the perspective dual pair (Np) and (Nd).
- (a)
(Basic Inequalities) It is always the case that
[TABLE]
Thus, and , respectively, imply that (**Nd***) and (Np) are infeasible.* 2. (a)
(Weak duality) If and are primal and dual feasible, then
[TABLE] 3. (a)
(Strong duality) If the dual (resp. primal) is feasible and the primal (resp. dual) is relatively strictly feasible, then and the perspective dual (resp. primal) attains its optimal value.
Proof 4.12**.**
Parts (a) and (b) follow immediately from the analogous result in Theorem 3.5, together with Theorem 4.7 and Corollary 4.9.
Next we demonstrate that (Np) is relatively strictly feasible if and only if (4.3) is relatively strictly feasible. By the description of relative interiors of sublevel sets given in [20, Theorem 7.6], (4.3) is relatively strictly feasible if and only if there exists a point such that
[TABLE]
We now seek a description of . We have
[TABLE]
By [20, Corollary 6.8.1], the above description yields
[TABLE]
Thus if and only if . Similarly,
[TABLE]
and so
[TABLE]
In particular, the condition is equivalent to . Thus the conditions for relative strict feasibility of (4.3) and (Np) are identical.
*A similar argument verifies that (Nd) is relatively strictly feasible if and only if (4.4) is relatively strictly feasible. Strong duality then follows from relative interiority, Corollary 4.9, Theorem 4.7, and the analogous strong-duality result in Theorem 3.5. *
4.3 Optimality conditions
The following result generalizes Theorem 3.7 to include the perspective-dual pair.
Theorem 4.13** (Perspective optimality).**
Suppose (Np) is strictly feasible. Then the tuple is perspective primal-dual optimal if and only if
[TABLE]
Proof 4.14**.**
*By construction, is optimal for (Np) if and only if is optimal for its gauge reformulation (4.3). Apply Theorem 3.7 to (4.3) and the corresponding gauge dual (Nd) to obtain the required conditions. *
The following result mirrors Corollary 3.11 for the perspective-duality case.
Corollary 4.15** (Perspective primal-dual recovery).**
Suppose that the primal (Np) is strictly feasible. If is optimal for (Nd), then for any primal feasible , the following conditions are equivalent:
- (a)
* is optimal for (Np);* 2. (b)
* and * 3. (c)
* and *
Proof 4.16**.**
By construction, is optimal for (Np) if and only if is optimal for its gauge reformulation (4.3). Apply Corollary 3.11 to (4.3) and its gauge dual (Nd) to obtain the equivalence of (a) and (b). To show the equivalence of (b) and (c), note that by the polar-gauge inequality, for all , or equivalently,
[TABLE]
*The inequality is tight for a fixed if and only if minimizes the function
This in turn is equivalent to , or*
[TABLE]
*This shows the equivalence of (b) and (c) and completes the proof. *
Section 6 illustrates an application of Corollary 4.15 for recovering primal optimal solutions from perspective-dual optimal solutions.
4.4 Reformulations of the perspective dual
Two reformulations of the perspective dual (Nd) may be useful depending on the functions and involved in (Np). First, an important simplification of the perspective dual occurs when one or both of these functions are gauges.
Corollary 4.17** (Simplification for gauges).**
If is a gauge, then a triple is optimal for (Nd) if and only if and is optimal for
[TABLE]
*If, in addition, is a gauge, then a triple is optimal for (Nd) if and only if , , and solves (Gd). *
Proof 4.18**.**
*Follows from the formulas for and established in Section 4.1.1. *
Theorem 4.3 also allows us to express the level sets of in terms of its conjugate polar as in the following corollary.
Corollary 4.19**.**
The point is optimal for (Nd) if and only if there exists a scalar such that is optimal for the problem
[TABLE]
Proof 4.20**.**
*By introducing the variable in (Nd), the result follows from Theorem 4.3. *
5 Examples: piecewise linear-quadratic and GLM constraints
From a computational standpoint, the perspective-dual formulation may be an attractive alternative to the original primal problem. The efficiency of this approach requires that the dual constraints are in some sense more tractable than those of the primal. For example, we may consider the dual feasible set “easy” if it admits an efficient procedure for projecting onto that set. In this section, we examine two special cases that admit tractable dual problems in this sense. The first case is the family of piecewise linear quadratic (PLQ) functions, introduced by Rockafellar [22] and subsequently examined by Rockafellar and Wets [23, p.440], and Aravkin, Burke, and Pillonetto [3]. The second case is when is a Bregman divergence arising from a maximum likelihood estimation problem over a family of exponentially distributed random variables.
For this section only, we will assume for the sake of simplicity that the objective is a gauge, so that the perspective dual in each of this cases simplifies as in Corollary 4.17. The more general case still applies.
5.1 PLQ constraints
The family of PLQ functions is a large class of convex functions that includes such commonly used penalties as the Huber function, the Vapnik -loss, and the hinge loss. The last two are used in support-vector regression and classification [3]. PLQ functions take the form
[TABLE]
where is defined by linear operators and , a vector , and an injective affine transformation from to . We may assume without loss of generality that is the identity transformation, since the primal problem (Np) already allows for composition of the constraint function with an affine transformation. We also assume that contains the origin, which implies that is nonnegative and thus can be interpreted as a penalty function. Aravkin, Burke, and Pillonetto [3] describe a range of PLQ functions that often appear in applications.
The conjugate representation of , given by
[TABLE]
is useful for deriving its polar perspective . In the following discussion, it is convenient to interpret the quadratic function as a closed convex function of , and thus when , we make the definition .
Theorem 5.1**.**
If is a PLQ function, then
[TABLE]
*where are the rows of that define in (5.1). *
Proof 5.2**.**
First observe that when is PLQ, . Apply Theorem 4.1 and simplify to obtain the chain of equalities
[TABLE]
Because is polyhedral, we can make the explicit description
[TABLE]
*This follows from considering cases on the signs of the , and noting that because contains the origin. Combining the above results, the theorem is proved. *
The next example illustrates how Theorem 5.1 can be applied to compute the perspective-polar transform of the Huber function.
Example 5.3** (Huber function).**
The Huber function [17], which is a smooth approximation to the absolute value function, is also its Moreau envelope of order . Thus it can be stated in conjugate form as
[TABLE]
which reveals . We then apply Theorem 4.1 to obtain
[TABLE]
Note that this can easily be extended beyond the univariate case to a separable sum by applying the result of Example 4.6.
We can now write down an explicit formulation of the perspective dual problem (Nd) when the primal problem (Np) has a PLQ-constrained feasible region (i.e., is PLQ) and a gauge objective (i.e., is a closed gauge). The constraint set of (Nd) simplifies significantly so that, for example, a first-order projection method might be applied to solve the problem. Apply Theorem 5.1 and introduce a scalar variable to rephrase the dual problem (Nd) as
[TABLE]
We can further simplify the constraint set using the fact that
[TABLE]
Thus, projecting a point onto the feasible set of (5.2) is equivalent to solving a second-order cone program (SOCP). In many important cases, the operator is extremely sparse. For example, when is a sum of separable Huber functions, we have . Hence in many practical cases, particularly when and the dual variables are low-dimensional, this projection problem could be solved efficiently using SOCP solvers that take advantage of sparsity, e.g., Gurobi [16].
5.2 Generalized linear models and the Bregman divergence
Suppose we are given a data set , where each vector describes features associated with observations . Assume that the vector of observations is distributed according to an exponential density where the conjugate of is the cumulant generating function of the distribution and serves to normalize the distribution. We assume that is a closed convex function of the Legendre type [20, p.258]. The maximum likelihood estimate (MLE) can be obtained as the maximizer of the log-likelihood function .
In applications that impose an a priori distribution on the parameters, the goal is to find an approximation to the MLE estimate that penalizes a regularization function (a surrogate for the prior). We assume a linear dependence between the parameters and feature vectors, and thus set , where the matrix has rows . A regularized MLE estimate could be obtained by solving the constrained problem
[TABLE]
where is the Bregman divergence function, and is a positive parameter that controls the divergence between the linear model and the first-moment relative to the density defined by [4].
We use Corollary 4.19 to derive the perspective dual, which requires the computation of the conjugate of :
[TABLE]
where we simplify the expression using the inverse relationship between the gradients of and its conjugate. Assume for simplicity that is a gauge, which is typical when it serves as a regularization function. In that case, the perspective dual reduces to
[TABLE]
cf. Corollaries 4.17 and 4.19.
Example 5.4** (Gaussian distribution).**
As a first example, consider the case where the are distributed as independent Gaussian variables with unit variance. In this case, and the above constraints specialize to
[TABLE]
*This is an example of a PLQ constraint, which falls into the category of problems described in Section 5.1. *
Example 5.5** (Poisson distribution).**
Consider the case where the observations are independent Poisson observations, which corresponds to and . Straightforward calculations show that the perspective dual constraints for the Poisson case reduce to
[TABLE]
where is a constant. By introducing new variables, this can be further simplified to require only affine constraints and relative-entropy constraints. To solve projection subproblems onto a constraint set of this form, we note that
[TABLE]
*is a self-concordant barrier for the set which is the epigraph of the relative entropy function; see Nesterov and Nemirovski [19, Proposition 5.1.4] and Boyd and Vandenberghe [5, Example 9.8]. Standard interior methods can therefore be used to project onto the constraint set. *
Example 5.6** (Bernoulli distribution).**
When the observations are independent Bernoulli observations, which corresponds to and , the perspective dual constraints in (5.4) reduce to
[TABLE]
*where is a constant. By introducing new variables, this can be rewritten with only affine constraints and relative-entropy constraints. Thus the projection subproblems can be solved as in the Poisson case. *
6 Examples: recovering primal solutions
Once we have solved the gauge or perspective dual problems, we have two available approaches for recovering a corresponding primal optimal solution. If we applied a (Lagrange) primal-dual algorithm (e.g., the algorithm of Chambolle and Pock [8]) to solve the dual, then Theorem 3.15 gives a direct recipe for constructing a primal solution from the algorithm’s output. On the other hand, if we applied a primal-only algorithm to solve the dual, we must instead rely on Corollary 3.11 or Corollary 4.15 to recover a primal solution. Interestingly, the alignment conditions in these theorems can provide insight into the structure of the primal optimal solution, as illustrated by the following examples.
6.1 Recovery for basis pursuit
denoising
Our first example illustrates how Corollary 3.11 can be used to recover primal optimal solutions from dual optimal solutions for a simple gauge problem. Consider the gauge dual pair
[TABLE]
which corresponds to the basis pursuit denoising problem. The 1-norm in the primal objective encourages sparsity in , while the constraint enforces a maximum deviation between a forward model and observations .
Let be optimal for the dual problem (6.1b), and set . Define the active set
[TABLE]
as the set of indices of that achieve the optimal objective value of the gauge dual. We use Corollary 3.11 to determine properties of a primal solution . In particular, the first part of Corollary 3.11(b) holds if and only if for all , and for all Thus, the maximal-in-modulus elements of determine the support for any primal optimal solution . The second condition in Corollary 3.11(b) holds if and only if . In order to satisfy this last condition, we solve the least-squares problem restricted to the support of the solution:
[TABLE]
(Note that , otherwise the primal problem is infeasible.) The efficiency of this least-squares solve depends on the number of elements in . For many applications of basis pursuit denoising, for example, we expect the support to be small relative to the length of , and in that case, the least-squares recovery problem is expected to be a relatively inexpensive subproblem. We may interpret the role of the dual problem as that of determining the optimal support of the primal, and the role of the above least-squares problem as recovering the actual values of the support.
6.2 Sparse recovery with Huber misfit
For an example where the constraint is not a gauge function, consider the variant of (6.1a)
[TABLE]
where is the Huber function; cf. Example 5.3. This problem corresponds to (Np) with and . Suppose that the tuple , with , is optimal for the perspective dual, and that (Np) attains its optimal value. Because is a gauge, Corollary 4.17 asserts that , and thus Corollary 4.15(b) reduces to the conditions
[TABLE]
As we did for the related example in Section 6.1, we use (6.3a) to deduce the support of the optimal primal solution. It follows from Theorem 5.1 that because is PLQ,
[TABLE]
In particular, because is a separable sum of Huber functions, , is the constant vector of all ones, and Since , it follows that
[TABLE]
For the set let be the set of maximizing indices. Then
[TABLE]
where conv denotes the convex hull operation. More concretely, precisely the following terms are contained in the convex hull above:
- •
if ;
- •
if and ,
where is the th standard basis vector. Note that if an optimal solution to (Np) exists, then LABEL:thm:perrecovery tells us that must be included in this convex hull, otherwise it is impossible to have
In summary, Corollary 4.15 tells us that to find an optimal solution for (Np), we need to solve a linear program to ensure that subject to the optimal support of , as determined by (6.3a). In cases where the size of the support is expected to be small (as might be expected with a 1-norm objective), this required linear program can be solved efficiently.
7 Numerical experiment: sparse robust regression
To illustrate the usefulness of the primal-from-dual recovery procedure implied by Theorem 3.15, we continue to examine the sparse robust regression problem (6.2), considered by Aravkin et al. [2]. The aim is to find a sparse signal (e.g., a spike train) from measurements contaminated by outliers. These experiments have been performed with the following data: , , and is a Gaussian matrix. The true solution is a spike train which has been constructed to have 20 nonzero entries, and the true noise has been constructed to have 5 outliers.
We compare two approaches for solving problem (6.2). In both, we use Chambolle and Pock’s (CP) algorithm [8], which is primal-dual (in the sense of Lagrange duality) and can be adapted to solve both the primal problem (6.2) and its perspective dual (5.2). Other numerical methods could certainly be applied to either of these problems, such as Shefi and Teboulle’s dual moving-ball method [25]. We note that a primal-only method, for example, applied to (5.2), would require us to use the methods of Section 6 rather than Theorem 3.15 for the recovery of a primal solution.
The CP method applied to problem (3.1) at each iteration computes
[TABLE]
where . The positive scalars and are chosen to satisfy . Setting , and yields the primal problem (6.2). In this case, the proximal operators and can be computed using the Moreau identity, i.e.,
[TABLE]
where is the projection onto the sublevel set in the definition of and is the projection onto the infinity-norm ball of radius . We implement using the Convex.jl [27] and Gurobi [16] software packages.
On the other hand, to apply CP to the perspective dual problem (5.2), one instead takes and , where is the constraint set for (5.2), and take to be the corresponding adjoint to the operator in (6.2). To compute , which is the projection onto , we solve the SOCP (5.3) using Gurobi. To evaluate , we again use the Moreau identity and project onto level sets of .
Fig. 7.1 compares the outcomes of running CP on the primal and perspective dual problems. This experiment exhibited similar behavior when run 500 times with different realizations of the random data, and so here we report on a single problem instance. Note that performing an iteration of CP on the perspective dual is significantly faster than performing an iteration of CP on the primal because can be computed much more efficiently than (see the discussion in Section 5.1). This also appears to make convergence of CP on the perspective dual more stable, as seen in Fig. 7.1(a). Fig. 7.1(c)-(d) illustrate the sparsity patterns of the iterates relative to those . Notably, we recover the correct sparsity patterns using Theorem 3.15. The recovery procedure outlined in LABEL:ex:one_huber also recovers the correct sparsity pattern, when applied to the final perspective dual iterate.
8 Discussion
Gauge duality is fascinating in part because it shares many symmetric properties with Lagrange duality, and yet Freund’s 1987 development of the concept flows from an entirely different principle based on polarity of the sets that define the gauge functions. On the other hand, Lagrange duality proceeds from a perturbation argument, which yields as one of its hallmarks a sensitivity interpretation of the dual variables. The discussion in Section 3 reveals that both duality notions can be derived from the same Fenchel-Rockafellar perturbation framework. The derivation of gauge duality using this framework appears to be its first application to a perturbation that does not lead to Lagrange duality. This new link between gauge duality and the perturbation framework establishes a sensitivity interpretation for gauge dual variables, which has not been available until now.
One motivation for this work is to explore alternative formulations of optimization problems that might be computationally advantageous for certain problem classes. The phase-retrieval problem, based on an SDP formulation, was a first application of ideas from gauge duality for developing large-scale solvers [14]. That approach, however, was limited in its flexibility because it required gauge functions. The discussions of Section 4 pave the way to new extensions, such as different models of the measurement process, as described in Section 5.2.
Another implication of this work is that it establishes the foundation for exploring a new breed of primal-dual algorithms based on perspective duality. Our own application of Chambolle and Pock’s primal-dual algorithm [8] to the perspective-dual problem, together with a procedure for extracting a primal estimate, is a first exploratory step towards developing variations of such methods. Future directions of research include the development of such algorithms, along with their attendant convergence properties and an understanding of the classes of problems for which they are practicable.
Acknowledgments
We are grateful to Patrick Combettes for pointing us to recent comprehensive work on properties of the perspective function and its applications [9, 10]. Our sincere thanks to two anonymous referees who provided an extensive list of corrections and suggestions that helped us to arrive at several strengthened results and to streamline our presentation.
Appendix A Proof of (2.6)
We prove each fact in succession.
(). By definition of the polar gauge and the polar cone, we have if and only if
[TABLE] 2. 2.
(). Suppose . Then for any and , by sublinearity of we have Thus , and . Suppose now that . Then in particular, for all . But then by positive homogeneity, , for all . This is a contradiction since , so we conclude that . 3. 3.
By positive homogeneity of and the definition of the polar gauge, if and only if
[TABLE] 4. 4.
(). Apply the third equality, replacing by , and then take polars on both sides. This concludes the proof.
Appendix B Proof of Lemma 3.4
With no loss in generality, we can assume that , because if , we use the convention (2.8) and its implication (2.9).
First suppose that the primal (Gp) is relatively strictly feasible. A point lies in the domain of if and only if the system
[TABLE]
is solvable for . Thus the set coincides with
[TABLE]
where is a linear subspace. We aim to show is in the relative interior of (B.1), which will show . Use [20, Lemma 7.3] and [20, Theorem 7.6] to obtain
[TABLE]
From relative strict feasibility of (Gp), the fact that , and again [20, Theorem 7.6], we deduce existence of an with and Fix a constant and define the pair . Then we immediately have and . It follows that the vector lies in . Thus lies in the intersection
[TABLE]
Use [20, Theorem 6.5, Corollary 6.6.2] to deduce that (B.2) is the relative interior of the intersection (B.1). Thus lies in the relative interior of as claimed.
Next, suppose that the gauge dual (Gd) is strictly feasible. By definition of the tuple lies in the domain of if and only if
[TABLE]
Thus is linearly isomorphic to the intersection
[TABLE]
where is the linear subspace . However, by [20, Lemma 7.3], relative strict feasibility of the dual (Gd) amounts to the inclusion
[TABLE]
Strict feasibility of (Gd) implies, via [20, Corollary 6.5.1, Corollary 6.6.2], that is in the relative interior of the intersection (B.3), and thus , as claimed.
Finally, the exact same arguments, but with relative interiors replaced by interiors, will prove the claims relating strict feasibility and interiority. This concludes the proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for convex optimization. ar Xiv:1602.01506 , 2016.
- 2[2] A. Y. Aravkin, J. V. Burke, and M. P. Friedlander. Variational properties of value functions. SIAM J. Optim. , 23(3):1689–1717, 2013.
- 3[3] A. Y. Aravkin, J. V. Burke, and G. Pillonetto. Linear system identification using stable spline kernels and PLQ penalties. In 52nd IEEE Decis. Contr. P. , pages 5168–5173, Dec 2013.
- 4[4] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. J. Mach. Learn. Res. , 6(Oct):1705–1749, 2005.
- 5[5] S. Boyd and L. Vandenberghe. Convex Optimization . Cambridge University Press, 2004.
- 6[6] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory , 52(2):489–509, Feb 2006.
- 7[7] E. J. Candès, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. , 59(8):1207–1223, 2006.
- 8[8] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging. Vis. , 40(1):120–145, 2011.
