Foundations of gauge and perspective duality

Alexandre Y. Aravkin; James V. Burke; Dmitriy Drusvyatskiy; Michael P.; Friedlander; Kellie MacPhee

arXiv:1702.08649·math.OC·June 20, 2018·SIAM J. Optim.

Foundations of gauge and perspective duality

Alexandre Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P., Friedlander, Kellie MacPhee

PDF

TL;DR

This paper revisits gauge duality, establishing a modern, unified framework with Fenchel-Rockafellar duality, and extends it to general nonnegative convex functions, enhancing understanding and applicability in convex optimization.

Contribution

It provides a modern, unified explanation of gauge duality using a perturbation framework and extends the theory to broader classes of convex functions and models.

Findings

01

Gauge duality can be explained via a perturbation-based duality approach.

02

Primal solutions can be recovered from gauge dual solutions through rescaling.

03

The framework applies to general nonnegative convex functions, including piecewise linear quadratic functions.

Abstract

We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.

Equations306

x minimize

x minimize

y maximize

y maximize

y minimize

val \lx@cref cr e f t y p e r e f n u m e q : g a ug e - p r ima l = val \lx@cref cr e f t y p e r e f n u m e q : l a g r an g e - d u a l,

val \lx@cref cr e f t y p e r e f n u m e q : g a ug e - p r ima l = val \lx@cref cr e f t y p e r e f n u m e q : l a g r an g e - d u a l,

1 = val \lx@cref cr e f t y p e r e f n u m e q : g a ug e - p r ima l \cdot val \lx@cref cr e f t y p e r e f n u m e q : g a ug e - d u a l,

1 = val \lx@cref cr e f t y p e r e f n u m e q : g a ug e - p r ima l \cdot val \lx@cref cr e f t y p e r e f n u m e q : g a ug e - d u a l,

p (u) := x in f F (x, u) and q (v) := y in f F^{⋆} (v, y) .

p (u) := x in f F (x, u) and q (v) := y in f F^{⋆} (v, y) .

p (0) = x in f F (x, 0) and p^{⋆⋆} (0) = y sup - F^{⋆} (0, y) \equiv - q (0) .

p (0) = x in f F (x, 0) and p^{⋆⋆} (0) = y sup - F^{⋆} (0, y) \equiv - q (0) .

f^{π} (x, λ) := ⎩ ⎨ ⎧ λ f (λ^{- 1} x) f^{\infty} (x) + \infty if λ > 0 if λ = 0 if λ < 0,

f^{π} (x, λ) := ⎩ ⎨ ⎧ λ f (λ^{- 1} x) f^{\infty} (x) + \infty if λ > 0 if λ = 0 if λ < 0,

κ^{\circ} (y) := in f {μ > 0} ⟨ x, y ⟩ \leq μ κ (x), \forall x,

κ^{\circ} (y) := in f {μ > 0} ⟨ x, y ⟩ \leq μ κ (x), \forall x,

epi κ^{\circ} = {(y, - λ)} (y, λ) \in (epi κ)^{\circ} .

epi κ^{\circ} = {(y, - λ)} (y, λ) \in (epi κ)^{\circ} .

κ^{\circ} = δ_{U_{κ}}^{*} = sup {⟨ u, \cdot ⟩} u \in U_{κ} where U_{κ} := {u} κ (u) \leq 1 .

κ^{\circ} = δ_{U_{κ}}^{*} = sup {⟨ u, \cdot ⟩} u \in U_{κ} where U_{κ} := {u} κ (u) \leq 1 .

⟨ x, y ⟩ \leq κ (x) \cdot κ^{\circ} (y) \forall x \in dom κ, \forall y \in dom κ^{\circ},

⟨ x, y ⟩ \leq κ (x) \cdot κ^{\circ} (y) \forall x \in dom κ, \forall y \in dom κ^{\circ},

H_{κ} := {u} κ (u) = 0

H_{κ} := {u} κ (u) = 0

U_{κ}^{\circ} = U_{κ^{\circ}}, U_{κ}^{\infty} = H_{κ}, (dom κ)^{\circ} = H_{κ^{\circ}}, \mbox an d H_{κ}^{\circ} = cl dom κ^{\circ}

U_{κ}^{\circ} = U_{κ^{\circ}}, U_{κ}^{\infty} = H_{κ}, (dom κ)^{\circ} = H_{κ^{\circ}}, \mbox an d H_{κ}^{\circ} = cl dom κ^{\circ}

F_{p} := {u} ρ (b - u) \leq σ and F_{d} := {y} ⟨ b, y ⟩ - σ ρ^{\circ} (y) \geq 1 .

F_{p} := {u} ρ (b - u) \leq σ and F_{d} := {y} ⟨ b, y ⟩ - σ ρ^{\circ} (y) \geq 1 .

(ρ, σ) \Rightarrow (δ_{H_{ρ}}, 1) whenever σ = 0.

(ρ, σ) \Rightarrow (δ_{H_{ρ}}, 1) whenever σ = 0.

σ ρ^{\circ} := δ_{cl dom ρ^{\circ}} \equiv δ_{H_{ρ}^{\circ}} when σ = 0.

σ ρ^{\circ} := δ_{cl dom ρ^{\circ}} \equiv δ_{H_{ρ}^{\circ}} when σ = 0.

A^{- 1} F_{p} \cap (dom κ) and A^{T} F_{d} \cap (dom κ^{\circ}) .

A^{- 1} F_{p} \cap (dom κ) and A^{T} F_{d} \cap (dom κ^{\circ}) .

A^{- 1} (ri F_{p}) \cap (ri dom κ) and A^{T} ri F_{d} \cap (ri dom κ^{\circ}) .

A^{- 1} (ri F_{p}) \cap (ri dom κ) and A^{T} ri F_{d} \cap (ri dom κ^{\circ}) .

ri F_{p}

ri F_{p}

ri F_{d}

x minimize f (A x) + g (x),

x minimize f (A x) + g (x),

p (0) = x in f {f (A x) + g (x)^{T}} and p^{⋆⋆} (0) = y sup {- f^{⋆} (- y) - g^{⋆} (A^{T} y)} .

p (0) = x in f {f (A x) + g (x)^{T}} and p^{⋆⋆} (0) = y sup {- f^{⋆} (- y) - g^{⋆} (A^{T} y)} .

\left.\begin{array}[]{@{}l@{}}\bar{x}\in\operatorname*{\mathrm{argmin}}_{x}\,F(x,0)\\ \bar{y}\in\operatorname*{\mathrm{argmax}}_{y}\,-F^{\star}(0,y)\\ F(\bar{x},0)=-F^{\star}(0,\bar{y})\end{array}\right\}\quad\Longleftrightarrow\quad(0,\bar{y})\in\partial F(\bar{x},0)\quad\Longleftrightarrow\quad(\bar{x},0)\in\partial F^{\star}(0,\bar{y}).

\left.\begin{array}[]{@{}l@{}}\bar{x}\in\operatorname*{\mathrm{argmin}}_{x}\,F(x,0)\\ \bar{y}\in\operatorname*{\mathrm{argmax}}_{y}\,-F^{\star}(0,y)\\ F(\bar{x},0)=-F^{\star}(0,\bar{y})\end{array}\right\}\quad\Longleftrightarrow\quad(0,\bar{y})\in\partial F(\bar{x},0)\quad\Longleftrightarrow\quad(\bar{x},0)\in\partial F^{\star}(0,\bar{y}).

p (0) \leq p (u) - ⟨ ϕ, u ⟩ = x in f {F (x, u) - ⟨ (0 ϕ), (x u) ⟩} \forall u .

p (0) \leq p (u) - ⟨ ϕ, u ⟩ = x in f {F (x, u) - ⟨ (0 ϕ), (x u) ⟩} \forall u .

p (0)

p (0)

= u in f {p (u) - ⟨ ϕ, u ⟩} \leq p (v) - ⟨ ϕ, v ⟩ \forall v,

v_{p} (u)

v_{p} (u)

v_{d} (t, θ)

v_{p} (u) = λ > 0, w in f {1/ λ} ρ (λb - A w + u) \leq σ λ, w \in U_{κ} .

v_{p} (u) = λ > 0, w in f {1/ λ} ρ (λb - A w + u) \leq σ λ, w \in U_{κ} .

F (w, λ, u) := - λ + δ_{(epi ρ) \times U_{κ}} W w λ u, \mbox w h er e W := - A 0 I_{n} b σ 0 I_{m} 00 .

F (w, λ, u) := - λ + δ_{(epi ρ) \times U_{κ}} W w λ u, \mbox w h er e W := - A 0 I_{n} b σ 0 I_{m} 00 .

p (u) := λ \geq 0, w in f F (w, λ, u) and q (t, θ) := y in f F^{⋆} (t, θ, y),

p (u) := λ \geq 0, w in f F (w, λ, u) and q (t, θ) := y in f F^{⋆} (t, θ, y),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Foundations of gauge and perspective duality111June 18, 2018

A.Y. Aravkin Department of Applied Mathematics, University of Washington, Seattle ([email protected]). Research supported by the Washington Research Foundation Data Science Professorship.

J.V. Burke Seattle, WA ([email protected]). Research supported in part by NSF award DMS-1514559.

D. Drusvyatskiy Department of Mathematics, University of Washington, Seattle ([email protected]; [email protected]). Research partially supported by AFOSR YIP award FA9550-15-1-0237.

M.P. Friedlander Departments of Computer Science and Mathematics, University of British Columbia, Vancouver, BC, Canada ([email protected]). Research supported by ONR award N00014-16-1-2242.

K.J. MacPhee*§*

Abstract

We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.

keywords:

convex optimization, gauge duality, nonsmooth optimization, perspective function

{AMS}

90C15, 90C25

1 Introduction

Sensitivity of the optimal values and solutions of optimization problems, with respect to perturbations in the problem data, is a central concern of Fenchel-Rockafellar duality theory. Lagrange duality can be regarded as a special case of this theory, in which perturbations to the data are introduced in a particular manner. Gauge duality, on the other hand, as introduced in 1987 by Freund [13], was developed without any reference to sensitivity. It relies instead on a special polarity correspondence that exists for nonnegative, positively homogeneous convex functions that vanish at the origin; these are known as gauge functions. In 2014, Friedlander, Macêdo, and Pong [15] made partial progress towards connecting gauge and Lagrange dualities. In the present work, we show that gauge duality may be regarded as a particular application of Fenchel-Rockafellar duality theory that is different than the one required for Lagrange duality. This connection provides a useful vantage point from which to develop new algorithms for an important class of convex optimization problems. We also describe how gauge duality theory can be extended beyond the optimization of gauge functions to the optimization of all convex functions that are bounded below. We call this extension perspective duality.

A convenient and fully general formulation for our approach is the problem

[TABLE]

where $A\colon\mathbb{R}^{n}\to\mathbb{R}^{m}$ is a linear map, $b$ is an $m$ -vector, and $\kappa$ and $\rho$ are closed gauge functions. For many applications, the function $\kappa$ is used to regularize the problem in order to obtain solutions with certain desirable properties. For example, in statistical and machine-learning applications the regularizer $\kappa$ is often a nonsmooth, structure-inducing function; e.g. the $1$ -norm, which is frequently used to encourage sparsity in the solution. The function $\rho$ may be regarded as a penalty function, such as the 2-norm, that measures the degree of misfit between the data $b$ and the linear model $Ax$ , and may reflect a statistical model of the noise in the data $b$ . The perspective duality extension enables us to consider optimization problems with a wider range of applications by allowing functions $\kappa$ and $\rho$ that are not positively homogenous, including the Huber function used for robust regression [17], the elastic net used for group detection [28], and the logistic loss used for classification [18, 1].

The formulation Eq. Gp gives rise to two different “dual” problems:

[TABLE]

Here $\rho^{\circ}$ and $\kappa^{\circ}$ are the polars of $\rho$ and $\kappa$ , which are also gauge functions; see Section 2.1 for a precise definition. In the important case $\sigma=0$ , we interpret $\sigma\rho^{\circ}$ as the indicator function of the closure of the domain of $\rho^{\circ}$ (see the discussion in LABEL:sect:assumptions). The first problem Eq. Ld is the standard Lagrangian (or Fenchel-Rockafellar) dual, which is the dual problem typically considered in connection with convex optimization problems. Strong duality, reflected in the equality

[TABLE]

and in the attainment of the optimal value of the Lagrange primal-dual pair, holds under mild interiority conditions often referred to as the Slater constraint qualification. The second problem Eq. Gd is the gauge dual and is less well-known. Under interiority conditions similar to those required by Lagrange duality, strong duality holds in the gauge duality setting; this is reflected in the analogous equality

[TABLE]

and in the attainment of the optimal value of the gauge primal-dual pair.

In certain contexts, the gauge dual Eq. Gd can be preferable for computation to the the primal Eq. Gp and the Lagrangian dual Eq. Ld, particularly when the polar $\kappa^{\circ}$ has a special structure. Friedlander and Macêdo [14], for example, use gauge duality to derive an effective algorithm for an important class of low-rank spectral optimization problems that arise in signal-recovery applications, including phase recovery and blind deconvolution. Indeed, the effectiveness of numerous convex optimization algorithms—particularly first-order methods—relies on being able to project easily onto the constraint set. The appearance of the linear map $A$ in the constraints of both Eq. Gp and Eq. Ld means that such methods may not be efficient, though some recent methods have been proposed that circumvent this difficulty [24]. In contrast, the map $A$ appears in the gauge dual Eq. Gd only in the objective, and computing subgradients of this objective only requires subgradients of $\kappa^{\circ}$ , together with the ability to efficiently implement matrix-vector multiplication. Moreover, typical applications occur in the regime $m\ll n$ . For example, $m$ is often logarithmic in $n$ [7, 6, 26, 11]. Because the dual variables $y$ of Eq. Gd lie in the much smaller space $\mathbb{R}^{m}$ , projections onto the feasible region may be computed efficiently, depending on the context. An example of how an interior method may be used for this purpose is given in Section 5.2.

1.1 Approach

This paper has two main goals. The first goal, addressed in Section 3, is to show how the foundations of gauge duality can be derived via a perturbation framework pioneered by Rockafellar[21, 20], in which the optimal value and optimal solution depend on parameters to the problem. We follow Rockafellar and Wets [23, 11.H], who consider an arbitrary convex perturbation function $F$ on $\mathbb{R}^{n}\times\mathbb{R}^{m}$ that determines how the parameters enter the problem, and define the value functions

[TABLE]

This set-up immediately yields the primal-dual pair

[TABLE]

Fenchel-Rockafellar duality theory flows from an appropriate choice of $F$ . We show that gauge duality fits equally well into this framework under a judicious choice of the perturbation function $F$ , thereby putting Fenchel-Rockafellar and gauge duality theories on an equal footing. Strong duality, primal-dual optimality conditions, and an interpretation of the gauge dual solutions as sensitivity measures—i.e., subgradients of the value function—quickly follow; cf. Section 3.2. These results, in particular, answer an open question posed by Freund in his original work [13], which asked for an interpretation of gauge dual variables for problems with nonlinear constraints. It also completes a partial analysis by Friedlander et al. [15] on the interpretation of gauge dual variables as sensitivity measures.

This viewpoint allows us to prove a striking relationship between optimal solutions of the primal and optimal solutions of the Lagrangian dual of the gauge dual: the two coincide up to scaling by the optimal value (Section 3.5). Consequently, Lagrangian primal-dual methods applied to the gauge dual can be used to recover solutions of the original primal problem. We illustrate this idea in Section 7 with an application of Chambolle and Pock’s primal-dual algorithm [8] to a specific problem instance.

The second goal of this paper is to extend the applicability of the gauge duality paradigm beyond gauges to capture more general convex problems. Section 4 extends gauge duality to problems involving convex functions that are merely nonnegative, and by an appropriate translation, functions that are bounded from below. The approach is based on using the perspective transform of a convex function [20, p. 35], which increases a function’s domain from $\mathbb{R}^{n}$ to $\mathbb{R}^{n+1}$ and makes it positively homogeneous, enabling the property that is key to the application of gauge duality. We term the resulting dual problem the perspective dual. The perspective-polar transformation, needed to derive the perspective dual problem, is developed in Section 4. Concrete illustrations of perspective duality for the family of piecewise linear-quadratic functions, which are often used in data-fitting applications, and for the setting of generalized linear models, are given in Section 5. We further explore examples of optimality conditions and primal-from-dual recovery in Section 6. Numerical illustrations for a case-study of perspective duals comprise Section 7.

2 Notation and assumptions

The derivation of our results relies on standard notions from convex analysis. Unless otherwise specified, we generally follow Rockafellar [20] for standard definitions and notation, including domains and epigraphs, relative interiors, convex conjugate functions, subdifferentials, polar sets, etc. In this section we collect less well-known definitions and notation used throughout the paper, and establish blanket assumptions on the problem data.

Let $\overline{\mathbb{R}}:=\mathbb{R}\cup\{+\infty\}$ denote the extended real line, and $\overline{\mathbb{R}}_{+}:=\{x\in\overline{\mathbb{R}}\,|x\geq 0\}$ denote the nonnegative extended reals. Let $f\colon{\mathbb{R}}^{n}\to\overline{\mathbb{R}}$ and $g\colon\mathbb{R}^{m}\to\overline{\mathbb{R}}$ denote general closed convex functions. For a closed convex set $\mathcal{C}\subseteq{\mathbb{R}}^{n}$ , its convex indicator $\delta_{\mathcal{C}}$ is the closed convex function whose value is zero on $\mathcal{C}$ and $+\infty$ otherwise. Let $\operatorname{\mathrm{cone}}\,\mathcal{C}:=\set{\lambda x}{\lambda\geq 0,\,x\in\mathcal{C}}$ denote the cone generated by $\mathcal{C}$ . We often abbreviate fractions such as $(1/(2\mu))$ to $(1/2\mu)$ .

2.1 The perspective transform

For any convex function $f:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ , its perspective is the function on $\mathbb{R}^{n+1}$ whose epigraph is the cone generated by the set $(\operatorname{\mathrm{epi}}f)\times\{1\}$ . Because this transform is not necessarily closed—even when $f$ is closed—we choose to work with its closure, and redefine the transform as

[TABLE]

where $f^{\infty}(x)$ is the recession function of $f$ [20, Theorem 8.5]. A calculus for the perspective transform $f\mapsto f^{\pi}$ is described by Aravkin, Burke, and Friedlander [2, Section 3.3] and, for the infinite-dimensional case, by Combettes [10, 9], where properties of the perspective transform are described in detail. We often apply more than one transformation to a function, and in such cases, the multiple transformations are applied in the order that they appear; e.g., $f^{\pi\circ}:=(f^{\pi})^{\circ}$ .

2.2 Gauge functions

The following is only a brief description of gauge functions. A complete description is given by Rockafellar [20, Section 15].

A convex function $\kappa:{\mathbb{R}}^{n}\to\overline{\mathbb{R}}$ is called a gauge if it is nonnegative, positively homogeneous, and vanishes at the origin. The symbols $\kappa\colon{\mathbb{R}}^{n}\to\overline{\mathbb{R}}$ and $\rho\colon\mathbb{R}^{m}\to\overline{\mathbb{R}}$ will always denote closed gauges. The polar of a gauge $\kappa$ is the function $\kappa^{\circ}$ defined by

[TABLE]

which is also a gauge and satisfies $\kappa^{\circ\circ}=\kappa$ when $\kappa$ is closed [20, Theorem 15.1]. For example, if $\kappa$ is a norm then $\kappa^{\circ}$ is the corresponding dual norm. Note the identity

[TABLE]

It follows directly from (2.2) and positive homogeneity of a gauge function that its polar can be characterized as the support function to the unit level set, i.e.,

[TABLE]

Moreover, $\kappa$ and $\kappa^{\circ}$ satisfy a Hölder-like inequality

[TABLE]

which we refer to as the polar-gauge inequality. The zero level set

[TABLE]

plays a key role when $\sigma=0$ . It is straightforward to show that

[TABLE]

whenever $\kappa$ is closed, where $\mathcal{U}_{\kappa}^{\infty}$ is the recession cone for $\mathcal{U}_{\kappa}$ [20, Section 8]. We include proofs of (2.6) in Appendix A.

2.3 Assumptions on the feasible region

Define the following primal and dual feasible sets:

[TABLE]

The nonnegativity of $\rho$ implies that the Slater condition can fail when $\sigma=0$ , and thus special attention is required. In this case, we make the replacement

[TABLE]

This replacement yields a gauge optimization problem whose solution set and optimal value coincide with those of (Gp). Observe that because $\mathcal{H}_{\rho}$ is a closed convex cone, $\delta_{\mathcal{H}_{\rho}}=\delta_{\mathcal{H}_{\rho}^{\circ}}^{*}$ is a closed gauge that satisfies, by virtue of (2.6), $\delta_{\mathcal{H}_{\rho}}^{\circ}=\delta_{\mathcal{H}_{\rho}^{\circ}}=\delta_{\operatorname{\mathrm{cl}}\operatorname{\mathrm{dom}}\rho^{\circ}}$ . This motivates the convention made immediately following (Gd) that

[TABLE]

The replacement (2.8) allows us to make the useful assumption that $\sigma>\inf\rho$ , which significantly streamlines our analysis. The convention (2.9) also makes sense from an epigraphical perspective, because the functions $\sigma\rho^{\circ}$ epigraphically converge to $\delta_{\operatorname{\mathrm{cl}}\operatorname{\mathrm{dom}}\rho^{\circ}}$ as $\sigma\downarrow 0$ [23, Proposition 7.4(c)].

The gauge primal (Gp) and dual (Gd) problems are said to be feasible, respectively, if the following intersections are nonempty:

[TABLE]

Similarly, the primal and dual problems are said to be relatively strictly feasible, respectively, if the following intersections are nonempty:

[TABLE]

If the intersections above are nonempty, with interior replacing relative interior, then we say that the problems are strictly feasible. We have

[TABLE]

which follows from Rockafellar [20, Theorem 7.6] when $\sigma>0$ , and from the convention (2.9) when $\sigma=0$ .

We assume throughout that $\rho(b)>\sigma$ . Otherwise, $\mathcal{F}_{p}$ contains the origin, which is a trivial solution of (Gp). This assumption is consistent with classical applications in signal processing and machine learning, where the corresponding assumption is that the data $b$ does not entirely consist of noise.

3 Perturbation analysis for gauge duality

Modern treatment of duality in convex optimization is based on an interpretation of multipliers as giving sensitivity information relative to perturbations in the problem data. No such analysis, however, has existed for gauge duality. In this section we show that for a particular kind of perturbation, the gauge dual (Gd) can in fact be derived via such an approach.

3.1 General perturbation framework

Our analysis is based on a perturbation theory described by Rockafellar and Wets [23, 11.H]. In this section we summarize the main results from [23] that we need. Fix an arbitrary convex function $F\colon\mathbb{R}^{n}\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ , and consider the value functions defined by (1.1)–(1.2). Observe the equality $q(0)=-p^{\star\star}(0)$ . For example, Fenchel-Rockafellar duality for the problem

[TABLE]

is obtained from the general perturbation theory by setting $F(x,u)=f(Ax+u)+g(x)$ . In that case, the primal-dual pair takes the familiar form

[TABLE]

Under certain conditions, described in the following theorem, strong duality holds, i.e. $p(0)=p^{\star\star}(0)$ , and the optimal values are attained.

Theorem 3.1 (Multipliers and sensitivity [23, Theorem 11.39]).

Consider the primal-dual pair (1.2), where $F\colon\mathbb{R}^{n}\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ is proper, closed, and convex.

(a)

The inequality $p(0)\geq-q(0)$ always holds. 2. (b)

If $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}p$ , then equality $p(0)=-q(0)$ holds and, if finite, the infimum $q(0)$ is attained with $\partial p(0)=\operatorname*{\mathrm{argmax}}_{y}-F^{\star}(0,y)$ . Similarly, if $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}q$ , then equality $p(0)=-q(0)$ holds and, if finite, the infimum $p(0)$ is attained with $\partial q(0)=\operatorname*{\mathrm{argmin}}_{x}F(x,0)$ . 3. (c)

The set $\operatorname*{\mathrm{argmax}}_{y}-F^{\star}(0,y)$ is nonempty and bounded if and only if $p(0)$ is finite and $0\in\operatorname{\mathrm{int}}\operatorname{\mathrm{dom}}p$ . 4. (d)

The set $\operatorname*{\mathrm{argmin}}_{x}F(x,0)$ is nonempty and bounded if and only if $q(0)$ is finite and $0\in\operatorname{\mathrm{int}}\operatorname{\mathrm{dom}}q$ . 5. (e)

Optimal solutions are characterized jointly through the conditions

[TABLE]

Proof 3.2.

The only difference between the statement of this theorem and that in [23, Theorem 11.39] is in part (b). Here, we make use of the relative interior rather than the interior. Thus, we only prove part (b). Suppose $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}p$ . If $p(0)=-\infty$ , then $p(0)=-q(0)$ follows by Part (a). Hence we can assume that $p(0)$ is finite, and conclude that $p$ is proper. By [20, Theorem 23.4], $\partial p(0)\neq\emptyset$ , and given $\phi\in\partial p(0)$ ,

[TABLE]

By taking the infimum over $u$ and recognizing the right-hand side as $-F^{\star}(0,\phi)$ , we deduce that $p(0)\leq-F^{\star}(0,\phi)\leq-q(0)$ . Combining this with Part (a) yields $p(0)=-F^{\star}(0,\phi)=-q(0)$ . Hence $\phi\in\operatorname*{\mathrm{argmax}}_{y}-F^{\star}(0,y)\ \neq\emptyset$ . Conversely, given any $\phi\in\operatorname*{\mathrm{argmax}}_{y}-F^{\star}(0,y)$ , we have

[TABLE]

*and so $\phi\in\partial p(0)$ . The case $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}q$ follows by an analogous argument. *

3.2 A perturbation for gauge duality

We now show that the problems Eq. Gp and Eq. Gd constitute a primal-dual pair under the framework set out by Theorem 3.1. The key is to postulate the correct pairing function $F$ . In the derivation below, we show that the gauge primal-dual pair corresponds to the primal and dual value functions

[TABLE]

where, as in (Gd), we use the convention described by (2.8) and (2.9). The parameters $u$ and $(t,\theta)$ are perturbations to the primal and dual gauge problems, respectively. This perturbation scheme differs significantly from that used in Fenchel-Rockafellar duality—cf. (3.1)—because of the product $\mu u$ .

We begin by observing that $v_{p}(0)$ is equal to the optimal value of the primal Eq. Gp. Because $u$ and $\mu$ appear as a product in this definition, it is convenient to reparametrize the problem by setting $\lambda:=1/\mu$ and $w:=x/\mu$ . The positive homogeneity of $\kappa$ and $\rho$ allows us to equivalently phrase the primal value function as

[TABLE]

In particular, this reparameterization shows that the value function $v_{p}$ is convex because it is the infimal projection of a convex function, and it is proper when the primal Eq. Gp is feasible.

We now construct the function $F$ appearing in Theorem 3.1 associated with this duality framework. In this construction, we assume that $\sigma>0$ , possibly making the replacement (2.8) if $\sigma=0$ . Note that minimizing $1/\lambda$ is equivalent to minimizing $-\lambda$ for $\lambda\geq 0$ . Define the convex function $F\colon\mathbb{R}^{n}\times\mathbb{R}\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ by

[TABLE]

Observe that the matrix $W$ is nonsingular.

Because $(0,0,0)\in\operatorname{\mathrm{dom}}F$ , and $\kappa$ and $\rho$ are closed, the function $F$ is closed and proper. This pairing function gives rise to the infimal projection problems

[TABLE]

which correspond to the general definitions shown in (1.1). Note that the function $p$ is the reciprocal of $v_{p}$ , as formalized in the following lemma (stated without proof).

Lemma 3.3.

*Equality $v_{p}(u)=-1/p(u)$ holds provided that $v_{p}(u)$ is nonzero and finite. Moreover, $v_{p}(u)=0$ if and only if $p(u)=-\infty$ , and $p(u)=0$ if and only if $v_{p}(u)=+\infty$ . *

We now compute the conjugate of $F$ , which is needed to derive the dual value function $q$ . By Rockafellar and Wets [23, Theorem 11.23(b)],

[TABLE]

where the closure operation $\operatorname{\mathrm{cl}}$ is applied to the function on the right-hand side with respect to the argument $(t,\theta,y)$ . Using the definition of $W$ , the constraint in the description of $F^{\star}$ is precisely $(r-A^{T}\!z,\langle b,z\rangle+\sigma\beta,z)=(t,\theta+1,y)$ , and the unique vector that satisfies these constraints is $(z,\beta,r)=(y,\,\sigma^{-1}(\theta+1-\langle b,y\rangle),\,t+A^{T}\!y)$ . The closure operation is therefore superfluous, and we obtain

[TABLE]

Since $\delta^{\star}_{\operatorname{\mathrm{epi}}\rho}(z_{1},z_{2})=\delta_{\operatorname{\mathrm{epi}}\rho^{\circ}}(z_{1},-z_{2})$ and $\delta^{\star}_{\mathcal{U}_{\kappa}}=\kappa^{\circ}$ by (2.3) and (2.4), this reduces to

[TABLE]

The application of Theorem 3.1 asks that we evaluate these conjugates at $(t,\theta)=(0,0)$ , which yields the expression

[TABLE]

Thus, the dual problem

[TABLE]

recovers, up to a sign change, the required gauge dual problem (Gd) when $\sigma>0$ . When $\sigma=0$ , we also recover the gauge dual problem (Gd) by making the appropriate substitutions (2.8) under the convention (2.9).

This discussion justifies the definition of the dual perturbation function $v_{d}(t,\theta):=\inf_{y}\,F^{\star}(t,\theta,y)$ , which is equivalent to the expression (3.2b). Note that $v_{d}(0,0)$ is the optimal value of (Gd). In summary, $(-1/v_{p})$ and $v_{d}$ , respectively, play the roles of $p$ and $q$ as defined in (3.3). In the application of Theorem 3.1, we identify $x$ with $(w,\lambda)$ , and $v$ with $(t,\theta)$ .

3.3 Proof of gauge duality

We now use the perturbation framework from Section 3.2 to prove weak and strong duality results for the gauge duality setting. Theorem 3.5 [15, section 5] is already known, but the proof via perturbation is new.

The following auxiliary result ties the feasibility of the gauge pair (Gp) and (Gd) to the domain of the value function. The proof of this result, which is largely an application of the calculus of relative interiors, is deferred to Appendix B.

Lemma 3.4 (Feasibility and domain of the value function).

*If the primal (Gp) is relatively strictly feasible, then $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}p$ . If the dual (Gd) is relatively strictly feasible, then $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}v_{d}$ . The analogous implications, where the $\operatorname{\mathrm{ri}}$ operator is replaced by the $\operatorname{\mathrm{int}}$ operator, hold under strict feasibility (not relative). *

The duality relations in the gauge framework follow analogous principles to Lagrange duality, except that instead of an additive relationship between the primal and dual optimal values the relationship is multiplicative. The following theorem summarizes weak and strong duality for gauge optimization.

Theorem 3.5 (Gauge duality [15]).

Set $\nu_{p}:=v_{p}(0)$ and $\nu_{d}:=v_{d}(0,0)$ . Then the following relationships hold for the gauge primal-dual pair (Gp) and (Gd).

(a)

(Basic Inequalities) It is always the case that

[TABLE]

In particular, if $\nu_{p}=0$ (resp. $\nu_{d}=0$ ), then (**Gd***) (resp. (Gp)) is infeasible.* 2. (b)

(Weak duality) If $x$ and $y$ are primal and dual feasible, then

[TABLE] 3. (c)

(Strong duality) If the dual (resp. primal) is feasible and the primal (resp. dual) is relatively strictly feasible, then $\nu_{p}\nu_{d}=1$ and the gauge dual (resp. primal) attains its optimal value.

Proof 3.6.

To simplify notation, in this proof we denote the optimal value of the primal value function by $p_{0}\equiv p(0)$ .

Part (a). We begin with the inequality (i). Theorem 3.1 guarantees the inequality

[TABLE]

By Lemma 3.3, whenever $\nu_{p}$ is nonzero and finite, equality $p_{0}=-1/\nu_{p}$ holds, which together with (3.4) yields (i). If, on the other hand, $\nu_{p}=+\infty$ , then (i) is trivial. Finally, if $\nu_{p}=0$ , Lemma 3.3 yields $p_{0}=-\infty$ , and hence (3.4) implies $\nu_{d}=+\infty$ , and (i) again holds. Thus, (i) holds always. To establish (ii), it suffices to consider the case $\nu_{d}=0$ . From (3.4) we conclude $p_{0}\geq 0$ , that is either $p_{0}=0$ or $p_{0}=+\infty$ . By Lemma 3.3, the first case $p_{0}=0$ implies $\nu_{p}=+\infty$ and therefore (ii) holds. The second case $p_{0}=+\infty$ implies that the primal problem is infeasible, that is $\nu_{p}=+\infty$ , and again (ii) holds. Thus (ii) holds always, as required.

Part (b). Because the gauge primal and dual problems are both feasible, $\nu_{p}$ and $\nu_{d}$ are nonzero and finite so the result follows from part (a).

Part (c). Suppose the dual is feasible and the primal is relatively strictly feasible. In particular, both $\nu_{p}$ and $\nu_{d}$ are nonzero and finite by part (a). Hence $1\leq\nu_{p}\nu_{d}=-\nu_{d}/p_{0}$ . On the other hand, by Lemma 3.4 the assumption that the primal is relatively strictly feasible implies $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}p$ . This last inequality implies $p_{0}=p(0)$ is finite, and hence $p(\cdot)$ is proper. Theorem 3.1(b) tells us that $p_{0}=-\nu_{d}$ and the infimum in the dual $\nu_{d}$ is attained. Thus we deduce $1=\nu_{p}\nu_{d}$ , as claimed.

*Conversely, suppose that the primal is feasible and the dual is relatively strictly feasible. Then, by Lemma 3.4, $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}q$ . This in turn implies $p_{0}=-\nu_{d}$ and that the infimum in $p(0)$ is attained. Since the primal is feasible, by Lemma 3.3, $p_{0}$ is nonzero, and hence $1=\nu_{p}\nu_{d}$ and the infimum in the primal is attained. *

3.4 Gauge optimality conditions

Our perturbation framework can be harnessed to develop optimality conditions for the gauge pair that relate the primal-dual solutions to subgradients of the corresponding value function. This yields a version of parts (b) and (d) in Theorem 3.1 that are specialized to gauge duality.

Theorem 3.7** (Gauge multipliers and

sensitivity).**

The following relationships hold for the gauge primal-dual pair (Gp) and (Gd).

(a)

If the primal is relatively strictly feasible and the dual is feasible, then the set of optimal solutions for the dual is nonempty and coincides with

[TABLE]

If it is further assumed that the primal is strictly feasible, then the set of optimal solutions to the dual is bounded. 2. (b)

If the dual is relatively strictly feasible and the primal is feasible, then the set of optimal solutions for the primal is nonempty with solutions $x^{*}=w^{*}/\lambda^{*}$ , where

[TABLE]

If it is further assumed that the dual is strictly feasible, then the set of optimal solutions to the primal is bounded.

Proof 3.8.

Part (a). Because (Gp) is relatively strictly feasible, it follows from Lemma 3.4 that $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}p$ , and because the dual is feasible, $p(0)$ is finite. Theorem 3.1 and Lemma 3.3 then imply the conclusion of Part (a). The statement on the boundedness of the set of the optimal solutions to the dual follows from Theorem 3.1.

*Part (b). Because (Gd) is relatively strictly feasible, it follows from Lemma 3.4 that $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}v_{d}$ , and because the primal is feasible, $v_{d}(0)$ is finite. Theorem 3.1 then implies that the optimal primal set is nonempty, and $\operatorname*{\mathrm{argmin}}_{w,\lambda}\,F(w,\lambda,0)=\partial v_{d}(0,0)$ . Because the primal and dual problems are feasible, any pair $(w^{*},\lambda^{*})\in\operatorname*{\mathrm{argmin}}_{w,\lambda}F(w,\lambda,0)$ must satisfy $\lambda^{*}>0$ by Theorem 3.5 and Lemma 3.3. Thus, this inclusion is equivalent to $x^{*}=w^{*}/\lambda^{*}$ being optimal for the primal problem, with optimal value $1/\lambda^{*}$ . This proves Part (b). The statement on the boundedness of the set of the optimal solutions to the primal again follows from Theorem 3.1. *

We use the sensitivity interpretation given by Theorem 3.7 to develop a set of necessary and sufficient optimality conditions that mirror the more familiar KKT conditions from Lagrange duality. For a primal-dual optimal pair $(x^{*},y^{*})$ , the condition $\rho^{\circ}(y^{*})=0$ characterizes a degenerate case when $\sigma>0$ because in that case the primal constraint is inactive at $x^{*}$ (i.e., $\rho(b-Ax^{*})<\sigma$ ). On the other hand, the dual constraint is always active at optimality because the positive homogeneity of the dual objective and the dual constraint imply $\langle b,y^{*}\rangle-\sigma\rho^{\circ}(y^{*})=1$ . The full primal-dual optimality conditions for gauge duality are described in the following theorem.

Theorem 3.9 (Optimality conditions).

Suppose both problems of the gauge dual pair Eq. Gp and Eq. Gd are relatively strictly feasible, and the pair $(x^{*},y^{*})$ is primal-dual feasible. Then $(x^{*},y^{*})$ is primal-dual optimal if and only if it satisfies the conditions

[TABLE]

Proof 3.10.

First suppose that $(\bar{x},\bar{y})$ satisfies (3.5a)-(3.5d). By Theorem 3.5, to show that $(\bar{x},\bar{y})$ is primal-dual optimal it is sufficient to show that $\kappa(\bar{x})\cdot\kappa^{\circ}(A^{T}\!\bar{y})=1$ . Add (3.5c) and (3.5d) to obtain

[TABLE]

By combining the above with (3.5b) we obtain $\kappa(\bar{x})\cdot\kappa^{\circ}(A^{T}\!\bar{y})=1$ , as desired.

Suppose now that $(x^{*},y^{*})$ is primal-dual optimal. We begin by assuming that $\sigma>0$ and obtain the case $\sigma=0$ by applying the result for the $\sigma>0$ case under the replacement (2.8). By the positive homogeneity of $\kappa^{\circ}$ and the optimality of $y^{*}$ , (3.5b) holds. Also note that $\kappa(x^{*})$ and $\kappa^{\circ}(A^{T}\!y^{*})$ are both nonzero and finite because of the strong duality guaranteed by Theorem 3.5.

Define $\lambda^{*}:=1/\kappa(x^{*})$ and $w^{*}:=\lambda^{*}x^{*},$ so that $\kappa(w^{*})=1$ . By Theorem 3.1(e) and Theorem 3.7(b), we must have $(0,0,y^{*})\in\partial F(w^{*},\lambda^{*},0)$ . Since the primal problem is relatively strictly feasible, we can apply [20, Theorem 23.9] to deduce the characterization

[TABLE]

where $\operatorname{\mathcal{N}}_{\mathcal{C}}(\cdot)$ denotes the normal cone to a set $\mathcal{C}$ . We now consider two cases. First, suppose $\rho(\lambda^{*}b-Aw^{*})=\lambda^{*}\sigma.$ Then (3.5a) holds, and by straightforward computations involving only (2.4) and the definitions of normal cones and subdifferentials, we have

[TABLE]

where

[TABLE]

and $\operatorname{\mathcal{N}}_{\mathcal{U}_{k}}(w^{*})=\set{v}{\kappa^{\circ}(v)\leq\langle v,w^{*}\rangle}$ . Substitute these formulas into (3.6) to obtain

[TABLE]

We deduce the existence of $z^{*}\in\partial\rho(\lambda^{*}b-Aw^{*})$ and $\mu^{*}\geq 0$ such that

[TABLE]

Note that $\mu^{*}=0$ cannot satisfy (3.7b), hence (3.7c), together with the polar-gauge inequality and the fact that $\kappa(w^{*})=1$ , implies

[TABLE]

Equality must hold in the above, and dividing through by $\lambda^{*}>0$ we see that (3.5c) is satisfied. Finally, we aim to show that (3.5d) holds using the fact that $y^{*}\in\mu^{*}\partial\rho(\lambda^{*}b-Aw^{*})$ . From the characterization (2.4) of the polar, we have

[TABLE]

In particular, this characterization implies $\langle y^{*}/\mu^{*},\lambda^{*}b-Aw^{*}\rangle\geq\langle 0,\lambda^{*}b-Aw^{*}\rangle=0.$ If $\rho(\lambda^{*}b-Aw^{*})=0$ , then by the polar-gauge inequality (2.5) we have

[TABLE]

which gives condition (3.5d) after dividing through by $\lambda^{*}$ . On the other hand, if $\rho(u)>0$ then the set (3.8) is given by $\set{y}{\rho(u)=\langle y,u\rangle,\,\rho^{\circ}(y)=1}$ . Thus when $\rho(\lambda^{*}b-Aw^{*})>0$ , we again have $\langle y^{*}/\mu^{*},\lambda^{*}b-Aw^{*}\rangle=\rho^{\circ}(y^{*}/\mu^{*})\cdot\rho(\lambda^{*}b-Aw^{*})$ , and multiplying through by $\mu^{*}/\lambda^{*}$ and applying (3.5a) gives (3.5d).

We have shown the forward implication of the theorem when $\rho(\lambda^{*}b-Aw^{*})=\lambda^{*}\sigma.$ The other case we need to consider is when $\rho(\lambda^{*}b-Aw^{*})<\lambda^{*}\sigma,$ or equivalently when $\rho(b-Ax^{*})<\sigma$ . An easy argument (e.g., see [12, Proposition 2.14(iv)]) shows

[TABLE]

Similar to the first case, we now have

[TABLE]

We deduce that $y^{*}\in\operatorname{\mathcal{N}}_{\operatorname{\mathrm{dom}}\rho}(\lambda^{*}b-Aw^{*})$ and also that $\langle b,y^{*}\rangle=1$ and $\kappa^{\circ}(A^{T}\!y^{*})\leq\langle A^{T}\!y^{*},w^{*}\rangle$ . Again, because $\kappa(w^{*})=1$ , the polar-gauge inequality implies (3.5c) holds.

We now show that $\rho^{\circ}(y^{*})=0$ and $\langle b-Ax^{*},y^{*}\rangle=0$ , which, if true, establishes (3.5a) and (3.5d) are satisfied as well. First note that $y^{*}\in\operatorname{\mathcal{N}}_{\operatorname{\mathrm{dom}}\rho}(u)$ implies $y^{*}\in(\operatorname{\mathrm{dom}}\rho)^{\circ}$ , which implies $\rho^{\circ}(y^{*})=0$ by Eq. 2.6 . Thus, by (3.5b), (3.5c), and the fact that $\kappa(x^{*})\cdot\kappa^{\circ}(A^{T}\!y^{*})=1$ from Theorem 3.5, we have

[TABLE]

Thus if $(x^{*},y^{*})$ is primal-dual optimal, then (3.5a)-(3.5d) hold, as claimed. This finishes the proof for $\sigma>0$ .

Let us now consider the case when $\sigma=0$ and apply what we have just proved to the pair (Gp) and (Gd) under the replacement (2.8). Then $(x^{*},y^{*})$ is primal-dual optimal if and only if the conditions (3.5a)-(3.5d) hold with $(\rho,\sigma)=(\delta_{\mathcal{H}_{\rho}},1)$ , i.e.,

[TABLE]

*If we combine this with primal feasibility, $\rho(b-Ax^{*})=0$ , and use the identity (2.9) that $0\cdot\rho^{\circ}=\delta_{\mathcal{H}_{\rho}^{\circ}}$ , then these conditions are equivalent to (3.5a)-(3.5d) for $\sigma=0$ , $\rho$ , and $\rho^{\circ}$ as written above. *

The following corollary describes a variation of the optimality conditions outlined by Theorem 3.9. These conditions assume that a solution $y^{*}$ of the dual problem is available, and gives conditions that can be used to determine a corresponding solution of the primal problem. An application of the following result appears in LABEL:sect:recovery_ex.

Corollary 3.11 (Gauge primal-dual recovery).

Suppose that the primal-dual pair (Gp) and (Gd) are each relatively strictly feasible. If $y^{*}$ is optimal for (Gd), then for any primal feasible $x$ the following conditions are equivalent:

(a)

$x$ * is optimal for (Gp);* 2. (b)

$\langle x,A^{T}\!y^{*}\rangle=\kappa(x)\cdot\kappa^{\circ}(A^{T}\!y^{*})$ * and $b-Ax\in\partial(\sigma\rho^{\circ})(y^{*})$ ;* 3. (c)

$A^{T}\!y^{*}\in\kappa^{\circ}(A^{T}\!y^{*})\cdot\partial\kappa(x)$ * and $b-Ax\in\partial(\sigma\rho^{\circ})(y^{*})$ ,*

*where, by convention, $\sigma\rho^{\circ}=\delta_{\operatorname{\mathrm{cl}}\operatorname{\mathrm{dom}}\rho^{\circ}}$ when $\sigma=0$ . *

Proof 3.12.

We use the optimality conditions given in Theorem 3.9. As noted before, by the optimality of $y^{*}$ we automatically have equality (3.5b) in the dual constraint.

We first show that (b) implies (a). Suppose (b) holds. Then (3.5c) holds automatically. From the characterization (2.4) of the polar, we have

[TABLE]

where the case $\sigma=0$ uses the convention (2.9). Thus, $\partial(\sigma\rho^{\circ})(y^{*})$ is the set of maximizing elements in this supremum. Because $b-Ax\in\partial(\sigma\rho^{\circ})(y^{*})$ , it holds that $\rho(b-Ax)\leq\sigma$ . If we additionally use the polar-gauge inequality, we deduce that

[TABLE]

and therefore the above inequalities are all tight. Thus conditions (3.5a) and (3.5d) hold, and by Theorem 3.9, $(x,y)$ is a primal-dual optimal pair.

We next show that (a) implies (b). Suppose that $x$ is optimal for (Gp). Then the first condition of (b) holds by (3.5c), and (3.5a) and (3.5d) combine to give us

[TABLE]

This implies that $z:=b-Ax$ is a maximizing element of the supremum in (3.9), and thus $b-Ax\in\partial(\sigma\rho^{\circ})(y^{*}).$

*Finally, to show the equivalence of (b) and (c), note that by the polar-gauge inequality, $\langle x,A^{T}\!y^{*}\rangle=\kappa(x)\cdot\kappa^{\circ}(A^{T}\!y^{*})$ if and only if $x$ minimizes the convex function $\kappa^{\circ}(A^{T}\!y^{*})\,\kappa(\cdot)-\langle\cdot,A^{T}\!y^{*}\rangle.$ This, in turn, is true if and only if $0\in\kappa^{\circ}(A^{T}\!y^{*})\,\partial\kappa(x)-A^{T}\!y^{*}$ , or equivalently, $A^{T}\!y^{*}\in\kappa^{\circ}(A^{T}\!y^{*})\cdot\partial\kappa(x)$ . *

3.5 The relationship between Lagrange and gauge multipliers

We now use the perturbation framework for duality to establish a relationship between gauge dual and Lagrange dual variables. We begin with an auxiliary result that characterizes the subdifferential of the perspective function (2.1). Combettes [10, Prop. 2.3(v)] also describes an equivalent formula for the subdifferential, though the derivation and subsequent form of the expression are very different. The formula in Lemma 3.13 is more suitable for our purposes.

Lemma 3.13** (Subdifferential of perspective

function).**

Let $g:{\mathbb{R}}^{n}\to\overline{\mathbb{R}}$ be a closed proper convex function. Then for $(x,\mu)\in\operatorname{\mathrm{dom}}g^{\pi}$ , equality holds:

[TABLE]

Proof 3.14.

Recall that the subdifferential of the support function to any nonempty closed convex set $\mathcal{C}$ is given by $\partial\delta_{\mathcal{C}}^{\star}(x)=\operatorname*{\mathrm{argmax}}\set{\langle z,x\rangle}{z\in\mathcal{C}}$ [20, Theorem 23.5 and Corollary 23.5.3]. By [20, Corollary 13.5.1], $g^{\pi}=\delta_{\mathcal{C}}^{\star}$ , where $\mathcal{C}=\set{(z,\gamma)}{g^{\star}(z)\leq-\gamma}$ is a closed convex set. If $(x,\mu)\in\operatorname{\mathrm{dom}}g^{\pi}$ , then $\mathcal{C}$ is nonempty and

[TABLE]

Suppose now that $\mu>0$ . Then

[TABLE]

Using the expression for the subdifferential of a support function, $(z,\gamma)$ achieves the supremum of (3.10) if $z\in\partial g(x/\mu)$ and $-\gamma=g^{\star}(z)$ . On the other hand, if $\mu=0$ then

[TABLE]

*Again using the expression for the subdifferential of a support function, $(z,\gamma)$ achieves the supremum of (3.10) if and only if $z\in\partial g^{\infty}(x)$ and $(z,-\gamma)\in\operatorname{\mathrm{epi}}{g^{\star}}$ . *

We now state the main result relating the optimal solutions of (Gp) to the optimal solutions of the Lagrange dual of (Gd).

Theorem 3.15.

Suppose that the gauge dual (Gd) is relatively strictly feasible and the primal (Gp) is feasible. Let $(L_{p})$ denote the Lagrange dual of (Gd), and let $\nu_{\scriptscriptstyle L}$ denote its optimal value. Then

[TABLE]

Proof 3.16.

We first note that $(L_{p})$ can be derived via the framework of Theorem 3.1 through the Lagrangian value function

[TABLE]

Here $h$ plays the role of $p$ in Theorem 3.1; cf. [23, Example 11.41]. Strong duality in Theorem 3.5 guarantees that $h(0)$ is nonzero and finite, and by Lemma 3.4,

[TABLE]

Thus, it follows from Theorem 3.1 that the optimal points $z^{*}$ for $(L_{p})$ are characterized by $z^{*}\in\partial h(0)$ . Note also that $h(0)=\nu_{L}$ .

On the other hand, by Theorem 3.7(b) the solutions to (Gp) are precisely the points $w^{*}/\lambda^{*}$ such that $(w^{*},\lambda^{*})\in\partial v_{d}(0,0)$ . Thus to relate the solution sets of $(L_{p})$ and (Gp), we must relate $\partial h(0)$ and $\partial v_{d}(0,0)$ .

For $\theta$ in a neighborhood of zero and all $t$ , by positive homogeneity of $\kappa^{\circ}$ and $\rho^{\circ}$ we have

[TABLE]

Thus by Lemma 3.13, $\partial v_{d}(0,0)=\set{(z,-h^{\star}(z))}{z\in\partial h(0)}.$ However, for $z\in\partial h(0)$ the Fenchel-Young equality gives us

[TABLE]

Thus we obtain the convenient description

[TABLE]

*and the set of optimal solutions for (Gp) is precisely $\frac{1}{\nu_{L}}\partial h(0)$ . *

4 Perspective duality

We now move on to an extension of the gauge duality framework, which allows us to consider functions that are not necessarily positively homogeneous, but continue to be nonnegative and convex. (The same framework applies to functions that are bounded below because these can be made nonnegative by translation.) For the remainder of the paper, consider functions $f:{\mathbb{R}}^{n}\to\overline{\mathbb{R}}_{+}$ and $g:\mathbb{R}^{m}\to\overline{\mathbb{R}}_{+}$ , that are closed, convex and nonnegative over their domains. In this section we derive and analyze the perspective-dual pair

[TABLE]

The functions $f^{\sharp}$ and $g^{\sharp}$ are the polars of the perspective transforms of $f$ and $g$ . This transform is a key operation needed to derive perspective duality. In the next section we describe properties of that transform and its application to the derivation of the perspective-dual pair. Throughout this section, we assume that $\sigma>\inf_{u}g(u)\geq 0$ .

4.1 Perspective-polar transform

Given a closed proper convex function $f:{\mathbb{R}}^{n}\to\overline{\mathbb{R}}_{+}$ , define the perspective-polar transform by $f^{\sharp}:=(f^{\pi})^{\circ}$ .

An explicit characterization of the perspective-polar transform is given by

[TABLE]

This representation can be obtained by applying the definition of the gauge polar (2.2) to the perspective transform as follows:

[TABLE]

which yields (4.1) after dividing through by $\lambda$ . Rockafellar’s extension [20, p.136] of the polar gauge transform to nonnegative convex functions that vanish at the origin coincides with $f^{\sharp}(z,-1)$ .

The following theorem provides an alternative characterization of the perspective-polar transform in terms of the more familiar Fenchel conjugate $f^{\star}$ . It also provides an expression for the perspective-polar of $f$ in terms of the Minkowski function generated by the epigraph of the conjugate of $f$ , i.e.,

[TABLE]

which is a gauge. Nonnegativity of $f$ is not required for the first part of this result.

Theorem 4.1.

*For any closed proper convex function $f$ with $0\in\operatorname{\mathrm{dom}}f$ , we have $f^{\pi\star}(z,-\xi)=\delta_{\operatorname{\mathrm{epi}}f^{\star}}(z,\xi).$ If, in addition, $f$ is nonnegative, $f^{\sharp}(z,-\xi)=\gamma_{\mbox{\scriptsize$ \operatorname{\mathrm{epi}}f^{\star} $}}(z,\xi)$ . *

Proof 4.2.

Because of the assumptions on $f$ , we have $f^{\pi}(x,0)=\liminf_{\lambda\to 0^{+}}f^{\pi}(x,\lambda)$ for each $x\in{\mathbb{R}}^{n}$ [20, Corollary 8.5.2]. Thus we obtain the following chain of equalities:

[TABLE]

This proves the first statement. Now additionally suppose that $f$ is nonnegative. Because $f^{\pi}$ is closed, it is identical to its biconjugate, and so $f^{\pi}(x,\lambda)=\delta_{\operatorname{\mathrm{epi}}f^{\star}}^{\star}(x,-\lambda)$ . Also, $\operatorname{\mathrm{epi}}f^{\star}$ is closed and convex, and contains the origin because $f$ is nonnegative. Therefore, it follows from [20, Corollary 15.1.2] that

[TABLE]

The following result relates the level sets of the perspective-polar transform to the level sets of the conjugate perspective. This result is useful in deriving the constraint sets for certain perspective-dual problems for which there is no closed form for the perspective polar; cf. Example 5.5.

Theorem 4.3 (Level-set equivalence).

Let $f:{\mathbb{R}}^{n}\to\overline{\mathbb{R}}_{+}$ be a nonnegative, closed proper convex function with $0\in\operatorname{\mathrm{dom}}f$ . Then, for any $(z,\xi,\mu)\in\mathbb{R}^{n}\times\mathbb{R}\times\mathbb{R}$ ,

[TABLE]

Proof 4.4.

The following chain of equivalences follows from Theorem 4.1:

[TABLE]

Define $\alpha=\inf\set{\lambda>0}{f^{\star\pi}(z,\lambda)\leq-\xi}$ .

We first show that $f^{\sharp}(z,\xi)\leq\mu$ implies $0\leq\mu$ and $f^{\star\pi}(z,\mu)\leq-\xi$ . By (4.2), $0\leq\alpha\leq\mu$ . If $\alpha<\mu$ , there exists $\lambda$ with $0<\lambda<\mu$ such that $f^{\star\pi}(z,\lambda)\leq-\xi.$ Because $f$ is nonnegative, $\mu f\geq\lambda f$ , and thus $(\mu f)$ . In particular,

[TABLE]

On the other hand, if $\alpha=\mu$ , there exists a sequence $\lambda_{k}\to\mu$ such that $f^{\star\pi}(z,\lambda_{k})\leq-\xi$ for each $k$ . Now by the lower semi-continuity of $f^{\star\pi}$ , we obtain

[TABLE]

This establishes the forward implication of the theorem.

For the reverse implication, suppose $0\leq\mu$ and $f^{\star\pi}(z,\mu)\leq-\xi$ . If $0<\mu$ , it follows from (4.2) that $f^{\sharp}(z,\xi)\leq\mu$ . Now suppose otherwise that $\mu=0.$ We want to show $f^{\sharp}(z,\xi)\leq 0$ . By hypothesis, $(z,0,-\xi)\in\operatorname{\mathrm{epi}}f^{\star\pi}$ . Thus there exists a sequence $(z_{k},\mu_{k},r_{k})$ with $\lim_{k\to\infty}(z_{k},\mu_{k},r_{k})=(z,0,-\xi)$ and $f^{\star\pi}(z_{k},\mu_{k})\leq r_{k}$ for all $k$ . With no loss in generality, we can assume that $\mu_{k}>0$ for all $k$ . Then for each $k$ , we have $\mu_{k}f^{\star}(z_{k}/\mu_{k})\leq r_{k}$ for which we have the following equivalences:

[TABLE]

*which gives $f^{\sharp}(z,\xi)\leq 0=\mu$ in the limit, since $f^{\sharp}$ is closed. *

4.1.1 Calculus rules

Two useful calculus rules are now developed that govern the perspective-polar transform when applied to gauge functions and separable sums.

Example 4.5 (Gauge functions).

Suppose that $f$ is a closed proper gauge. Then

[TABLE]

Use expression (4.1) for this derivation. When $\xi>0$ , take $x=0$ in the infimum in (4.1) to deduce that $f^{\sharp}(z,\xi)=+\infty.$ On the other hand, when $\xi\leq 0$ , the positive homogeneity of $f$ implies that $f^{\sharp}(z,\xi)=f^{\circ}(z)$ . We leave the details to the reader. More generally, if $f$ vanishes at the origin, then $f^{\sharp}(z,\xi)=+\infty$ for all $\xi>0.$

Example 4.6 (Separable sums).

Suppose that $f(x):=\sum_{i=1}^{n}f_{i}(x_{i}),$ where each convex function $f_{i}:\mathbb{R}^{n_{i}}\to\overline{\mathbb{R}}_{+}$ is nonnegative. Then a straightforward computation shows that $f^{\pi}(x,\lambda)=\sum_{i=1}^{n}f_{i}^{\pi}(x_{i},\lambda)$ . Furthermore, taking into account [15, Proposition 2.4], which expresses the polar of a separable sum of gauges, we deduce

[TABLE]

4.2 Derivation of the perspective dual via lifting

We now derive the relationship between the primal and dual problems (Np) and (Nd) by lifting (Np) to an equivalent gauge optimization problem, and then recognizing (Nd) as its gauge dual.

Theorem 4.7 (Gauge lifting of the primal).

A point $x^{*}$ is optimal for (Np) if and only if $(x^{*},1)$ is optimal for the gauge problem

[TABLE]

*where $\rho(z,\mu,\tau):=g^{\pi}(z,\tau)+\delta_{\{0\}}(\mu)$ is a gauge function. *

Proof 4.8.

By definition of $f^{\pi}$ , $x^{*}$ is optimal for (Np) if and only if the pair $(x^{*},1)$ is optimal for

[TABLE]

The following equivalence follows from the definition of $\rho$ :

[TABLE]

*Thus we arrive at the constraint expressed in (4.3). *

Corollary 4.9 (Gauge dual).

*Problem (Nd) is the gauge dual of (4.3). *

Proof 4.10.

It follows from the canonical dual pairing (Gp) and (Gd) that the gauge dual of (4.3) is

[TABLE]

Because $\rho$ is separable in $(z,\mu)$ and $\beta$ , it follows from [15, Proposition 2.4] that

[TABLE]

*Since $\delta_{\{0\}}^{\circ}(\alpha)$ is identically zero, the result follows. *

The next result generalizes the gauge duality result of Theorem 3.5 to the case where $f$ and $g$ are convex and nonnegative but not necessarily gauges. We parallel the construction in (2.7), and for this section only redefine the feasible sets by

[TABLE]

Thus, (Np) is relatively strictly feasible if

[TABLE]

Similarly, (Nd) is relatively strictly feasible if there exists a triple $(y,\alpha,\mu)$ such that

[TABLE]

Strict feasibility follows the same definitions, where the operation $\operatorname{\mathrm{ri}}$ is replaced by $\operatorname{\mathrm{int}}$ .

Theorem 4.11 (Perspective duality).

Let $\nu_{p}$ and $\nu_{d}$ , respectively, denote the optimal values of the pair (Np) and (Nd). Then the following relationships hold for the perspective dual pair (Np) and (Nd).

(a)

(Basic Inequalities) It is always the case that

[TABLE]

Thus, $\nu_{p}=0$ and $\nu_{d}=0$ , respectively, imply that (**Nd***) and (Np) are infeasible.* 2. (a)

(Weak duality) If $x$ and $(y,\alpha,\mu)$ are primal and dual feasible, then

[TABLE] 3. (a)

(Strong duality) If the dual (resp. primal) is feasible and the primal (resp. dual) is relatively strictly feasible, then $\nu_{p}\nu_{d}=1$ and the perspective dual (resp. primal) attains its optimal value.

Proof 4.12.

Parts (a) and (b) follow immediately from the analogous result in Theorem 3.5, together with Theorem 4.7 and Corollary 4.9.

Next we demonstrate that (Np) is relatively strictly feasible if and only if (4.3) is relatively strictly feasible. By the description of relative interiors of sublevel sets given in [20, Theorem 7.6], (4.3) is relatively strictly feasible if and only if there exists a point $(x,1)\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}f^{\pi}$ such that

[TABLE]

We now seek a description of $\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}f^{\pi}$ . We have

[TABLE]

By [20, Corollary 6.8.1], the above description yields

[TABLE]

Thus $(x,1)\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}f^{\pi}$ if and only if $x\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}f$ . Similarly,

[TABLE]

and so

[TABLE]

In particular, the condition $(b-Ax,0,1)\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}\rho$ is equivalent to $b-Ax\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}g$ . Thus the conditions for relative strict feasibility of (4.3) and (Np) are identical.

*A similar argument verifies that (Nd) is relatively strictly feasible if and only if (4.4) is relatively strictly feasible. Strong duality then follows from relative interiority, Corollary 4.9, Theorem 4.7, and the analogous strong-duality result in Theorem 3.5. *

4.3 Optimality conditions

The following result generalizes Theorem 3.7 to include the perspective-dual pair.

Theorem 4.13 (Perspective optimality).

Suppose (Np) is strictly feasible. Then the tuple $(x^{*},y^{*},\alpha^{*},\mu^{*})$ is perspective primal-dual optimal if and only if

[TABLE]

Proof 4.14.

*By construction, $x^{*}$ is optimal for (Np) if and only if $(x^{*},1)$ is optimal for its gauge reformulation (4.3). Apply Theorem 3.7 to (4.3) and the corresponding gauge dual (Nd) to obtain the required conditions. *

The following result mirrors Corollary 3.11 for the perspective-duality case.

Corollary 4.15 (Perspective primal-dual recovery).

Suppose that the primal (Np) is strictly feasible. If $(y^{*},\alpha^{*},\mu^{*})$ is optimal for (Nd), then for any primal feasible $x\in{\mathbb{R}}^{n}$ , the following conditions are equivalent:

(a)

$x$ * is optimal for (Np);* 2. (b)

$\langle x,A^{T}\!y^{*}\rangle+\alpha^{*}=f(x)\cdot f^{\sharp}(A^{T}\!y^{*},\alpha^{*})$ * and $(b-Ax,1)\in\sigma\partial g^{\sharp}(y^{*},\mu^{*});$ * 3. (c)

$A^{T}\!y^{*}\in f^{\sharp}(A^{T}\!y^{*},\alpha^{*})\cdot\partial f(x)$ * and $(b-Ax,1)\in\sigma\partial g^{\sharp}(y^{*},\mu^{*}).$ *

Proof 4.16.

By construction, $x$ is optimal for (Np) if and only if $(x,1)$ is optimal for its gauge reformulation (4.3). Apply Corollary 3.11 to (4.3) and its gauge dual (Nd) to obtain the equivalence of (a) and (b). To show the equivalence of (b) and (c), note that by the polar-gauge inequality, $\langle(x,1),(A^{T}\!y^{*},\alpha^{*})\rangle\leq f^{\pi}(x,1)\cdot f^{\sharp}(A^{T}\!y^{*},\alpha^{*})$ for all $x$ , or equivalently,

[TABLE]

*The inequality is tight for a fixed $x$ if and only if $x$ minimizes the function

$h:=f^{\sharp}(A^{T}\!y^{*},\alpha^{*})f(\cdot)-\langle\cdot,A^{T}\!y^{*}\rangle-\alpha^{*}.$ This in turn is equivalent to $0\in\partial h(x)$ , or*

[TABLE]

*This shows the equivalence of (b) and (c) and completes the proof. *

Section 6 illustrates an application of Corollary 4.15 for recovering primal optimal solutions from perspective-dual optimal solutions.

4.4 Reformulations of the perspective dual

Two reformulations of the perspective dual (Nd) may be useful depending on the functions $f$ and $g$ involved in (Np). First, an important simplification of the perspective dual occurs when one or both of these functions are gauges.

Corollary 4.17 (Simplification for gauges).

If $f$ is a gauge, then a triple $(y^{*},\alpha^{*},\mu^{*})$ is optimal for (Nd) if and only if $\alpha^{*}\leq 0$ and $(y^{*},\mu^{*})$ is optimal for

[TABLE]

*If, in addition, $g$ is a gauge, then a triple $(y^{*},\alpha^{*},\mu^{*})$ is optimal for (Nd) if and only if $\alpha^{*}\leq 0$ , $\mu^{*}\leq 0$ , and $y^{*}$ solves (Gd). *

Proof 4.18.

*Follows from the formulas for $f^{\sharp}$ and $g^{\sharp}$ established in Section 4.1.1. *

Theorem 4.3 also allows us to express the level sets of $g^{\sharp}$ in terms of its conjugate polar as in the following corollary.

Corollary 4.19.

The point $(y^{*},\alpha^{*},\mu^{*})$ is optimal for (Nd) if and only if there exists a scalar $\xi^{*}$ such that $(y^{*},\alpha^{*},\mu^{*},\xi^{*})$ is optimal for the problem

[TABLE]

Proof 4.20.

*By introducing the variable $\xi:=(\langle b,y\rangle+\alpha+\mu-1)/\sigma$ in (Nd), the result follows from Theorem 4.3. *

5 Examples: piecewise linear-quadratic and GLM constraints

From a computational standpoint, the perspective-dual formulation may be an attractive alternative to the original primal problem. The efficiency of this approach requires that the dual constraints are in some sense more tractable than those of the primal. For example, we may consider the dual feasible set “easy” if it admits an efficient procedure for projecting onto that set. In this section, we examine two special cases that admit tractable dual problems in this sense. The first case is the family of piecewise linear quadratic (PLQ) functions, introduced by Rockafellar [22] and subsequently examined by Rockafellar and Wets [23, p.440], and Aravkin, Burke, and Pillonetto [3]. The second case is when $g$ is a Bregman divergence arising from a maximum likelihood estimation problem over a family of exponentially distributed random variables.

For this section only, we will assume for the sake of simplicity that the objective $f$ is a gauge, so that the perspective dual in each of this cases simplifies as in Corollary 4.17. The more general case still applies.

5.1 PLQ constraints

The family of PLQ functions is a large class of convex functions that includes such commonly used penalties as the Huber function, the Vapnik $\epsilon$ -loss, and the hinge loss. The last two are used in support-vector regression and classification [3]. PLQ functions take the form

[TABLE]

where $g$ is defined by linear operators $L\in\mathbb{R}^{\ell\times\ell}$ and $W\in\mathbb{R}^{k\times\ell}$ , a vector $w\in\mathbb{R}^{k}$ , and an injective affine transformation $B(\cdot)+b$ from $\mathbb{R}^{k}$ to $\mathbb{R}^{\ell}$ . We may assume without loss of generality that $B(\cdot)+b$ is the identity transformation, since the primal problem (Np) already allows for composition of the constraint function $g$ with an affine transformation. We also assume that $\mathcal{U}$ contains the origin, which implies that $g$ is nonnegative and thus can be interpreted as a penalty function. Aravkin, Burke, and Pillonetto [3] describe a range of PLQ functions that often appear in applications.

The conjugate representation of $g$ , given by

[TABLE]

is useful for deriving its polar perspective $g^{\sharp}$ . In the following discussion, it is convenient to interpret the quadratic function $-(1/2\mu)\|Ly\|^{2}$ as a closed convex function of $\mu\in\mathbb{R}_{-}$ , and thus when $\mu=0$ , we make the definition $-(1/2\mu)\|Ly\|^{2}=\delta_{\{0\}}(y)$ .

Theorem 5.1.

If $g$ is a PLQ function, then

[TABLE]

*where $W_{1}^{T},\ldots,W_{k}^{T}$ are the rows of $W$ that define $\mathcal{U}$ in (5.1). *

Proof 5.2.

First observe that when $g$ is PLQ, $\operatorname{\mathrm{epi}}g^{\star}=\set{(y,\tau)}{y\in\mathcal{U},\,\frac{1}{2}\|Ly\|^{2}\leq\tau}$ . Apply Theorem 4.1 and simplify to obtain the chain of equalities

[TABLE]

Because $\mathcal{U}$ is polyhedral, we can make the explicit description

[TABLE]

*This follows from considering cases on the signs of the $W_{i}^{T}\!y$ , and noting that $w\geq 0$ because $\mathcal{U}$ contains the origin. Combining the above results, the theorem is proved. *

The next example illustrates how Theorem 5.1 can be applied to compute the perspective-polar transform of the Huber function.

Example 5.3 (Huber function).

The Huber function [17], which is a smooth approximation to the absolute value function, is also its Moreau envelope of order $\eta$ . Thus it can be stated in conjugate form as

[TABLE]

which reveals $h_{\eta}^{\star}(y)=\delta_{[-\eta,\eta]}(y)+(\eta/2)y^{2}$ . We then apply Theorem 4.1 to obtain

[TABLE]

Note that this can easily be extended beyond the univariate case to a separable sum by applying the result of Example 4.6.

We can now write down an explicit formulation of the perspective dual problem (Nd) when the primal problem (Np) has a PLQ-constrained feasible region (i.e., $g$ is PLQ) and a gauge objective (i.e., $f$ is a closed gauge). The constraint set of (Nd) simplifies significantly so that, for example, a first-order projection method might be applied to solve the problem. Apply Theorem 5.1 and introduce a scalar variable $\xi$ to rephrase the dual problem (Nd) as

[TABLE]

We can further simplify the constraint set using the fact that

[TABLE]

Thus, projecting a point $\overline{y}$ onto the feasible set of (5.2) is equivalent to solving a second-order cone program (SOCP). In many important cases, the operator $L$ is extremely sparse. For example, when $g$ is a sum of separable Huber functions, we have $L=\sqrt{\eta}I$ . Hence in many practical cases, particularly when $m\ll n$ and the dual variables are low-dimensional, this projection problem could be solved efficiently using SOCP solvers that take advantage of sparsity, e.g., Gurobi [16].

5.2 Generalized linear models and the Bregman divergence

Suppose we are given a data set $\{(a_{i},b_{i})\}_{i=1}^{m}\subseteq\mathbb{R}^{n+1}$ , where each vector $a_{i}$ describes features associated with observations $b_{i}$ . Assume that the vector $b$ of observations is distributed according to an exponential density $p(y\mid\theta)=\exp[\langle\theta,y\rangle-\phi^{\star}(\theta)-p_{0}(y)],$ where the conjugate of $\phi:{\mathbb{R}}^{n}\to\mathbb{R}$ is the cumulant generating function of the distribution and $p_{0}:{\mathbb{R}}^{n}\to\mathbb{R}$ serves to normalize the distribution. We assume that $\phi$ is a closed convex function of the Legendre type [20, p.258]. The maximum likelihood estimate (MLE) can be obtained as the maximizer of the log-likelihood function $\log p(y\mid\theta)$ .

In applications that impose an a priori distribution on the parameters, the goal is to find an approximation to the MLE estimate that penalizes a regularization function $f$ (a surrogate for the prior). We assume a linear dependence between the parameters and feature vectors, and thus set $\theta=Ax$ , where the matrix $A$ has rows $a_{i}$ . A regularized MLE estimate could be obtained by solving the constrained problem

[TABLE]

where $d_{\phi}(v;w):=\phi(v)-\phi(w)-\langle\nabla\phi(w),v-w\rangle$ is the Bregman divergence function, and $\sigma$ is a positive parameter that controls the divergence between the linear model $Ax$ and the first-moment $\nabla\phi(b)$ relative to the density defined by $\phi$ [4].

We use Corollary 4.19 to derive the perspective dual, which requires the computation of the conjugate of $g(z):=d_{\phi\conj}(z;\nabla\phi(b))$ :

[TABLE]

where we simplify the expression using the inverse relationship between the gradients of $\phi$ and its conjugate. Assume for simplicity that $f$ is a gauge, which is typical when it serves as a regularization function. In that case, the perspective dual reduces to

[TABLE]

cf. Corollaries 4.17 and 4.19.

Example 5.4 (Gaussian distribution).

As a first example, consider the case where the $b_{i}$ are distributed as independent Gaussian variables with unit variance. In this case, $\phi:={\textstyle{\frac{1}{2}}}\|\cdot\|^{2}$ and the above constraints specialize to

[TABLE]

*This is an example of a PLQ constraint, which falls into the category of problems described in Section 5.1. *

Example 5.5 (Poisson distribution).

Consider the case where the observations $b_{i}$ are independent Poisson observations, which corresponds to $\phi(\theta)=\theta\log\theta-\theta$ and $\phi^{\star}(y)=e^{y}$ . Straightforward calculations show that the perspective dual constraints for the Poisson case reduce to

[TABLE]

where $\beta=\sum_{i=1}^{m}(b_{i}+b_{i}\log b_{i})$ is a constant. By introducing new variables, this can be further simplified to require only affine constraints and $m$ relative-entropy constraints. To solve projection subproblems onto a constraint set of this form, we note that

[TABLE]

*is a self-concordant barrier for the set $\set{(x,y,r)}{y>0,\ y\log(y/x)\leq r},$ which is the epigraph of the relative entropy function; see Nesterov and Nemirovski [19, Proposition 5.1.4] and Boyd and Vandenberghe [5, Example 9.8]. Standard interior methods can therefore be used to project onto the constraint set. *

Example 5.6 (Bernoulli distribution).

When the observations $b_{i}$ are independent Bernoulli observations, which corresponds to $\phi(\theta)=\theta\log\theta+(1-\theta)\log(1-\theta)$ and $\phi^{\star}(y)=\log(1+e^{y})$ , the perspective dual constraints in (5.4) reduce to

[TABLE]

*where $\beta=\sum_{i=1}^{m}(b_{i}\log b_{i}+(1-b_{i})\log(1-b_{i}))$ is a constant. By introducing new variables, this can be rewritten with only affine constraints and $2m$ relative-entropy constraints. Thus the projection subproblems can be solved as in the Poisson case. *

6 Examples: recovering primal solutions

Once we have solved the gauge or perspective dual problems, we have two available approaches for recovering a corresponding primal optimal solution. If we applied a (Lagrange) primal-dual algorithm (e.g., the algorithm of Chambolle and Pock [8]) to solve the dual, then Theorem 3.15 gives a direct recipe for constructing a primal solution from the algorithm’s output. On the other hand, if we applied a primal-only algorithm to solve the dual, we must instead rely on Corollary 3.11 or Corollary 4.15 to recover a primal solution. Interestingly, the alignment conditions in these theorems can provide insight into the structure of the primal optimal solution, as illustrated by the following examples.

6.1 Recovery for basis pursuit

denoising

Our first example illustrates how Corollary 3.11 can be used to recover primal optimal solutions from dual optimal solutions for a simple gauge problem. Consider the gauge dual pair

[TABLE]

which corresponds to the basis pursuit denoising problem. The 1-norm in the primal objective encourages sparsity in $x$ , while the constraint enforces a maximum deviation between a forward model $Ax$ and observations $b$ .

Let $y^{*}$ be optimal for the dual problem (6.1b), and set $z=A^{T}\!y^{*}$ . Define the active set

[TABLE]

as the set of indices of $z$ that achieve the optimal objective value of the gauge dual. We use Corollary 3.11 to determine properties of a primal solution $x^{*}$ . In particular, the first part of Corollary 3.11(b) holds if and only if $x_{i}^{*}=0$ for all $i\notin I(z)$ , and $\mathrm{sign}(x_{i}^{*})=\mathrm{sign}(z_{i})$ for all $i\in I(z).$ Thus, the maximal-in-modulus elements of $A^{T}\!y^{*}$ determine the support for any primal optimal solution $x^{*}$ . The second condition in Corollary 3.11(b) holds if and only if $b-Ax=\sigma y^{*}/\|y^{*}\|_{2}$ . In order to satisfy this last condition, we solve the least-squares problem restricted to the support of the solution:

[TABLE]

(Note that $y^{*}\neq 0$ , otherwise the primal problem is infeasible.) The efficiency of this least-squares solve depends on the number of elements in $I(z)$ . For many applications of basis pursuit denoising, for example, we expect the support to be small relative to the length of $x$ , and in that case, the least-squares recovery problem is expected to be a relatively inexpensive subproblem. We may interpret the role of the dual problem as that of determining the optimal support of the primal, and the role of the above least-squares problem as recovering the actual values of the support.

6.2 Sparse recovery with Huber misfit

For an example where the constraint is not a gauge function, consider the variant of (6.1a)

[TABLE]

where $h_{\eta}$ is the Huber function; cf. Example 5.3. This problem corresponds to (Np) with $f(x)=\|x\|_{1}$ and $g=h$ . Suppose that the tuple $(y,\alpha,\mu)$ , with $\mu<0$ , is optimal for the perspective dual, and that (Np) attains its optimal value. Because $f$ is a gauge, Corollary 4.17 asserts that $\alpha=0$ , and thus Corollary 4.15(b) reduces to the conditions

[TABLE]

As we did for the related example in Section 6.1, we use (6.3a) to deduce the support of the optimal primal solution. It follows from Theorem 5.1 that because $g$ is PLQ,

[TABLE]

In particular, because $h$ is a separable sum of Huber functions, $W=[I\ {-}I]^{T}$ , $w$ is the constant vector of all ones, and $L=\sqrt{\eta}I.$ Since $\mu<0$ , it follows that

[TABLE]

For the set $\set{v_{1},\ldots,\,v_{2m+1}}:=\left\{y_{1},\ldots,y_{m},-y_{1},\ldots,-y_{m},-\frac{\eta}{2\mu}\|y\|^{2}\right\},$ let $J(y,\mu):=\set{j}{|v_{j}|=\max_{i=1,\ldots,2m+1}|v_{i}|}$ be the set of maximizing indices. Then

[TABLE]

where conv denotes the convex hull operation. More concretely, precisely the following terms are contained in the convex hull above:

•

$\left({-}\frac{\eta}{\mu}y,\,\frac{\eta}{2\mu^{2}}\|y\|_{2}^{2}\right)$ if ${-}\frac{\eta}{2\mu}\|y\|^{2}\geq\|y\|_{\infty}$ ;

•

$\left(\mathrm{sign}\,(y_{i})\cdot e_{i},\,0\right)$ if $i\in[m]$ and $|y_{i}|=\|y\|_{\infty}\geq-\frac{\eta}{2\mu}\|y\|^{2}$ ,

where $e_{i}$ is the $i$ th standard basis vector. Note that if an optimal solution to (Np) exists, then LABEL:thm:perrecovery tells us that $({-}(\eta/\mu)y,\,(\eta/2\mu^{2})\|y\|^{2})$ must be included in this convex hull, otherwise it is impossible to have $(b-Ax,1)\in\partial h^{\sharp}(y,\mu).$

In summary, Corollary 4.15 tells us that to find an optimal solution $x$ for (Np), we need to solve a linear program to ensure that $(b-Ax,1)\in{\mbox{conv}\,}\set{\nabla v_{j}}{j\in J(y,\mu)}$ subject to the optimal support of $x$ , as determined by (6.3a). In cases where the size of the support is expected to be small (as might be expected with a 1-norm objective), this required linear program can be solved efficiently.

7 Numerical experiment: sparse robust regression

To illustrate the usefulness of the primal-from-dual recovery procedure implied by Theorem 3.15, we continue to examine the sparse robust regression problem (6.2), considered by Aravkin et al. [2]. The aim is to find a sparse signal (e.g., a spike train) from measurements contaminated by outliers. These experiments have been performed with the following data: $m=120,$ $n=512,$ $\sigma=0.2$ , $\eta=1$ , and $A$ is a Gaussian matrix. The true solution $\overline{x}\in\{-1,0,1\}$ is a spike train which has been constructed to have 20 nonzero entries, and the true noise $b-A\overline{x}$ has been constructed to have 5 outliers.

We compare two approaches for solving problem (6.2). In both, we use Chambolle and Pock’s (CP) algorithm [8], which is primal-dual (in the sense of Lagrange duality) and can be adapted to solve both the primal problem (6.2) and its perspective dual (5.2). Other numerical methods could certainly be applied to either of these problems, such as Shefi and Teboulle’s dual moving-ball method [25]. We note that a primal-only method, for example, applied to (5.2), would require us to use the methods of Section 6 rather than Theorem 3.15 for the recovery of a primal solution.

The CP method applied to problem (3.1) at each iteration $k$ computes

[TABLE]

where $\operatorname{\mathrm{prox}}_{\alpha f}(x):=\operatorname*{\mathrm{argmin}}_{y}\{f(y)+\frac{1}{2\alpha}\|x-y\|_{2}^{2}\}$ . The positive scalars $\alpha_{x}$ and $\alpha_{y}$ are chosen to satisfy $\alpha_{x}\alpha_{y}\|A\|^{2}<1$ . Setting $f=\delta_{h(b-\cdot)\leq\sigma}$ , and $g=\|\cdot\|_{1}$ yields the primal problem (6.2). In this case, the proximal operators $\operatorname{\mathrm{prox}}_{\alpha f^{\star}}$ and $\operatorname{\mathrm{prox}}_{\alpha g}$ can be computed using the Moreau identity, i.e.,

[TABLE]

where $\Pi_{f}$ is the projection onto the sublevel set in the definition of $f$ and $\Pi_{\alpha\mathbb{B}_{\infty}}$ is the projection onto the infinity-norm ball of radius $\alpha$ . We implement $\Pi_{f}$ using the Convex.jl [27] and Gurobi [16] software packages.

On the other hand, to apply CP to the perspective dual problem (5.2), one instead takes $f=(\|\cdot\|)^{\circ}=\|\cdot\|_{\infty}$ and $g=\delta_{\mathcal{Q}}$ , where $\mathcal{Q}$ is the constraint set for (5.2), and take $A$ to be the corresponding adjoint to the operator in (6.2). To compute $\operatorname{\mathrm{prox}}_{\alpha_{y}g}$ , which is the projection onto $\mathcal{Q}$ , we solve the SOCP (5.3) using Gurobi. To evaluate $\operatorname{\mathrm{prox}}_{\alpha f^{\star}}$ , we again use the Moreau identity and project onto level sets of $\|\cdot\|_{1}$ .

Fig. 7.1 compares the outcomes of running CP on the primal and perspective dual problems. This experiment exhibited similar behavior when run 500 times with different realizations of the random data, and so here we report on a single problem instance. Note that performing an iteration of CP on the perspective dual is significantly faster than performing an iteration of CP on the primal because $\Pi_{\mathcal{Q}}$ can be computed much more efficiently than $\Pi_{f}$ (see the discussion in Section 5.1). This also appears to make convergence of CP on the perspective dual more stable, as seen in Fig. 7.1(a). Fig. 7.1(c)-(d) illustrate the sparsity patterns of the iterates $x_{k}$ relative to those $\overline{x}$ . Notably, we recover the correct sparsity patterns using Theorem 3.15. The recovery procedure outlined in LABEL:ex:one_huber also recovers the correct sparsity pattern, when applied to the final perspective dual iterate.

8 Discussion

Gauge duality is fascinating in part because it shares many symmetric properties with Lagrange duality, and yet Freund’s 1987 development of the concept flows from an entirely different principle based on polarity of the sets that define the gauge functions. On the other hand, Lagrange duality proceeds from a perturbation argument, which yields as one of its hallmarks a sensitivity interpretation of the dual variables. The discussion in Section 3 reveals that both duality notions can be derived from the same Fenchel-Rockafellar perturbation framework. The derivation of gauge duality using this framework appears to be its first application to a perturbation that does not lead to Lagrange duality. This new link between gauge duality and the perturbation framework establishes a sensitivity interpretation for gauge dual variables, which has not been available until now.

One motivation for this work is to explore alternative formulations of optimization problems that might be computationally advantageous for certain problem classes. The phase-retrieval problem, based on an SDP formulation, was a first application of ideas from gauge duality for developing large-scale solvers [14]. That approach, however, was limited in its flexibility because it required gauge functions. The discussions of Section 4 pave the way to new extensions, such as different models of the measurement process, as described in Section 5.2.

Another implication of this work is that it establishes the foundation for exploring a new breed of primal-dual algorithms based on perspective duality. Our own application of Chambolle and Pock’s primal-dual algorithm [8] to the perspective-dual problem, together with a procedure for extracting a primal estimate, is a first exploratory step towards developing variations of such methods. Future directions of research include the development of such algorithms, along with their attendant convergence properties and an understanding of the classes of problems for which they are practicable.

Acknowledgments

We are grateful to Patrick Combettes for pointing us to recent comprehensive work on properties of the perspective function and its applications [9, 10]. Our sincere thanks to two anonymous referees who provided an extensive list of corrections and suggestions that helped us to arrive at several strengthened results and to streamline our presentation.

Appendix A Proof of (2.6)

We prove each fact in succession.

( $\mathcal{U}_{\kappa}^{\circ}=\mathcal{U}_{\kappa^{\circ}}$ ). By definition of the polar gauge and the polar cone, we have $y\in\mathcal{U}_{\kappa^{\circ}}$ if and only if

[TABLE] 2. 2.

( $\mathcal{U}_{\kappa}^{\infty}=\mathcal{H}_{\kappa}$ ). Suppose $x\in\mathcal{H}_{\kappa}$ . Then for any $u\in\mathcal{U}_{\kappa}$ and $\lambda>0$ , by sublinearity of $\kappa$ we have $\kappa(u+\lambda x)\leq\kappa(u)+\lambda\kappa(x)\leq 1+\lambda\cdot 0=1.$ Thus $x\in\mathcal{U}_{\kappa}^{\infty}$ , and $\mathcal{H}_{\kappa}\subseteq\mathcal{U}_{\kappa}^{\infty}$ . Suppose now that $y\in\mathcal{U}_{\kappa}^{\infty}\setminus\mathcal{H}_{\kappa}$ . Then in particular, $\kappa\left(y/\kappa(y)+\lambda y\right)\leq 1$ for all $\lambda>0$ . But then by positive homogeneity, $\left(1/\kappa(y)+\lambda\right)\kappa(y)\leq 1$ , for all $\lambda>0$ . This is a contradiction since $\kappa(y)>0$ , so we conclude that $\mathcal{H}_{\kappa}=\mathcal{U}_{\kappa}^{\infty}$ . 3. 3.

$((\operatorname{\mathrm{dom}}\kappa)^{\circ}=\mathcal{H}_{\kappa^{\circ}}).$ By positive homogeneity of $\kappa$ and the definition of the polar gauge, $y\in\mathcal{H}_{\kappa^{\circ}}$ if and only if

[TABLE] 4. 4.

( $\mathcal{H}_{\kappa}^{\circ}=\operatorname{\mathrm{cl}}\operatorname{\mathrm{dom}}\kappa^{\circ}$ ). Apply the third equality, replacing $\kappa$ by $\kappa^{\circ}$ , and then take polars on both sides. This concludes the proof.

Appendix B Proof of Lemma 3.4

With no loss in generality, we can assume that $\sigma>0$ , because if $\sigma=0$ , we use the convention (2.8) and its implication (2.9).

First suppose that the primal (Gp) is relatively strictly feasible. A point $u$ lies in the domain of $p$ if and only if the system

[TABLE]

is solvable for $(w,\lambda)$ . Thus the set $(\operatorname{\mathrm{dom}}p)\times\{0\}\times\{0\}$ coincides with

[TABLE]

where $L:=\set{(a,b,c)}{b=0,\,c=0}$ is a linear subspace. We aim to show $(0,0,0)$ is in the relative interior of (B.1), which will show $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}p$ . Use [20, Lemma 7.3] and [20, Theorem 7.6] to obtain

[TABLE]

From relative strict feasibility of (Gp), the fact that $\sigma>0$ , and again [20, Theorem 7.6], we deduce existence of an $x\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}\kappa$ with $b-Ax\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}\rho$ and $\rho(b-Ax)<\sigma.$ Fix a constant $r>\kappa(x)$ and define the pair $(w,\lambda):=(x/r,1/r)$ . Then we immediately have $(b\lambda-Aw,\sigma\lambda)\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{epi}}\rho$ and $\kappa(w)<1$ . It follows that the vector $-M\begin{bmatrix}w\\ \lambda\end{bmatrix}$ lies in $(\operatorname{\mathrm{ri}}\operatorname{\mathrm{epi}}\rho)\times\operatorname{\mathrm{ri}}\mathcal{U}_{\kappa}$ . Thus $(0,0,0)$ lies in the intersection

[TABLE]

Use [20, Theorem 6.5, Corollary 6.6.2] to deduce that (B.2) is the relative interior of the intersection (B.1). Thus $y=0$ lies in the relative interior of $\operatorname{\mathrm{dom}}p$ as claimed.

Next, suppose that the gauge dual (Gd) is strictly feasible. By definition of $F^{\star},$ the tuple $(w,\lambda)$ lies in the domain of $v_{d}$ if and only if

[TABLE]

Thus $\operatorname{\mathrm{dom}}v_{d}$ is linearly isomorphic to the intersection

[TABLE]

where $L^{\prime}$ is the linear subspace $L^{\prime}:=\{(a,b,c)\mid b=0\}$ . However, by [20, Lemma 7.3], relative strict feasibility of the dual (Gd) amounts to the inclusion

[TABLE]

Strict feasibility of (Gd) implies, via [20, Corollary 6.5.1, Corollary 6.6.2], that $(0,0,0)$ is in the relative interior of the intersection (B.3), and thus $0\in\operatorname{\mathrm{ri}}\operatorname{\mathrm{dom}}v_{d}$ , as claimed.

Finally, the exact same arguments, but with relative interiors replaced by interiors, will prove the claims relating strict feasibility and interiority. This concludes the proof.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for convex optimization. ar Xiv:1602.01506 , 2016.
2[2] A. Y. Aravkin, J. V. Burke, and M. P. Friedlander. Variational properties of value functions. SIAM J. Optim. , 23(3):1689–1717, 2013.
3[3] A. Y. Aravkin, J. V. Burke, and G. Pillonetto. Linear system identification using stable spline kernels and PLQ penalties. In 52nd IEEE Decis. Contr. P. , pages 5168–5173, Dec 2013.
4[4] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. J. Mach. Learn. Res. , 6(Oct):1705–1749, 2005.
5[5] S. Boyd and L. Vandenberghe. Convex Optimization . Cambridge University Press, 2004.
6[6] E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory , 52(2):489–509, Feb 2006.
7[7] E. J. Candès, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. , 59(8):1207–1223, 2006.
8[8] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging. Vis. , 40(1):120–145, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Foundations of gauge and perspective duality111June 18, 2018

Abstract

keywords:

1 Introduction

1.1 Approach

2 Notation and assumptions

2.1 The perspective transform

2.2 Gauge functions

2.3 Assumptions on the feasible region

3 Perturbation analysis for gauge duality

3.1 General perturbation framework

Theorem 3.1** (Multipliers and sensitivity [23, Theorem 11.39]).**

Proof 3.2**.**

3.2 A perturbation for gauge duality

Lemma 3.3**.**

3.3 Proof of gauge duality

Lemma 3.4** (Feasibility and domain of the value function).**

Theorem 3.5** (Gauge duality [15]).**

Proof 3.6**.**

3.4 Gauge optimality conditions

Theorem 3.7** (Gauge multipliers and

Proof 3.8**.**

Theorem 3.9** (Optimality conditions).**

Proof 3.10**.**

Corollary 3.11** (Gauge primal-dual recovery).**

Proof 3.12**.**

3.5 The relationship between Lagrange and gauge multipliers

Lemma 3.13** (Subdifferential of perspective

Proof 3.14**.**

Theorem 3.15**.**

Proof 3.16**.**

4 Perspective duality

4.1 Perspective-polar transform

Theorem 4.1**.**

Proof 4.2**.**

Theorem 4.3** (Level-set equivalence).**

Proof 4.4**.**

4.1.1 Calculus rules

Example 4.5** (Gauge functions).**

Example 4.6** (Separable sums).**

4.2 Derivation of the perspective dual via lifting

Theorem 4.7** (Gauge lifting of the primal).**

Proof 4.8**.**

Corollary 4.9** (Gauge dual).**

Proof 4.10**.**

Theorem 4.11** (Perspective duality).**

Proof 4.12**.**

4.3 Optimality conditions

Theorem 4.13** (Perspective optimality).**

Proof 4.14**.**

Corollary 4.15** (Perspective primal-dual recovery).**

Proof 4.16**.**

4.4 Reformulations of the perspective dual

Corollary 4.17** (Simplification for gauges).**

Proof 4.18**.**

Corollary 4.19**.**

Proof 4.20**.**

5 Examples: piecewise linear-quadratic and GLM constraints

5.1 PLQ constraints

Theorem 5.1**.**

Proof 5.2**.**

Example 5.3** (Huber function).**

5.2 Generalized linear models and the Bregman divergence

Example 5.4** (Gaussian distribution).**

Example 5.5** (Poisson distribution).**

Example 5.6** (Bernoulli distribution).**

6 Examples: recovering primal solutions

6.1 Recovery for basis pursuit

6.2 Sparse recovery with Huber misfit

7 Numerical experiment: sparse robust regression

8 Discussion

Acknowledgments

Appendix A Proof of (2.6)

Appendix B Proof of Lemma 3.4

Theorem 3.1 (Multipliers and sensitivity [23, Theorem 11.39]).

Proof 3.2.

Lemma 3.3.

Lemma 3.4 (Feasibility and domain of the value function).

Theorem 3.5 (Gauge duality [15]).

Proof 3.6.

Proof 3.8.

Theorem 3.9 (Optimality conditions).

Proof 3.10.

Corollary 3.11 (Gauge primal-dual recovery).

Proof 3.12.

Proof 3.14.

Theorem 3.15.

Proof 3.16.

Theorem 4.1.

Proof 4.2.

Theorem 4.3 (Level-set equivalence).

Proof 4.4.

Example 4.5 (Gauge functions).

Example 4.6 (Separable sums).

Theorem 4.7 (Gauge lifting of the primal).

Proof 4.8.

Corollary 4.9 (Gauge dual).

Proof 4.10.

Theorem 4.11 (Perspective duality).

Proof 4.12.

Theorem 4.13 (Perspective optimality).

Proof 4.14.

Corollary 4.15 (Perspective primal-dual recovery).

Proof 4.16.

Corollary 4.17 (Simplification for gauges).

Proof 4.18.

Corollary 4.19.

Proof 4.20.

Theorem 5.1.

Proof 5.2.

Example 5.3 (Huber function).

Example 5.4 (Gaussian distribution).

Example 5.5 (Poisson distribution).

Example 5.6 (Bernoulli distribution).