This paper analyzes Bregman-Moreau envelopes, extending previous work by exploring both left and right variants and providing new asymptotic results, with applications in convex and nonconvex optimization.
Contribution
It offers a comprehensive analysis of both left and right Bregman-Moreau envelopes, including new asymptotic properties, expanding the theoretical understanding of these regularization tools.
Findings
01
Extended analysis of Bregman-Moreau envelopes for convex and nonconvex functions.
02
Derived new asymptotic properties of the envelopes.
03
Provided multiple illustrative examples.
Abstract
Moreau's seminal paper, introducing what is now called the Moreau envelope and the proximity operator (also known as the proximal mapping), appeared in 1965. The Moreau envelope of a given convex function provides a regularized version which has additional desirable properties such as differentiability and full domain. Fifty years ago, Attouch proposed using the Moreau envelope for regularization. Since then, this branch of convex analysis has developed in many fruitful directions. In 1967, Bregman introduced what is nowadays known as the Bregman distance as a measure of discrepancy between two points generalizing the square of the Euclidean distance. Proximity operators based on the Bregman distance have become a topic of significant research as they are useful in the algorithmic solution of optimization problems. More recently, in 2012, Kan and Song studied regularization aspects of…
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
Regularizing with Bregman–Moreau envelopes
Heinz H. Bauschke,
Minh N. Dao and
Scott B. Lindstrom
Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada.
E-mail: [email protected].
CARMA, University of Newcastle, Callaghan, NSW 2308, Australia.
E-mail: [email protected].
CARMA, University of Newcastle, Callaghan, NSW 2308, Australia.
E-mail: [email protected].
(November 16, 2018)
Abstract
Moreau’s seminal paper,
introducing what is now called the Moreau envelope and the
proximity operator (also known as the proximal mapping), appeared in 1962.
The Moreau envelope of a given convex function provides a regularized
version which has additional desirable properties such as differentiability
and full domain. Forty years ago, Attouch proposed using
the Moreau envelope for regularization. Since then, this branch
of convex analysis has developed in many fruitful directions.
In 1967,
Bregman introduced what is nowadays known as the Bregman distance
as a measure of discrepancy between two points generalizing
the square of the Euclidean distance.
Proximity operators based on the Bregman distance have become a
topic of significant research as they are useful in the algorithmic
solution of optimization problems.
More recently,
in 2012, Kan and Song studied regularization aspects of the
left Bregman–Moreau envelope even for nonconvex functions.
In this paper, we complement previous works by analyzing
the left and right Bregman–Moreau envelopes
and by providing additional asymptotic results.
Several examples are provided.
which we equip with
the standard inner product ⟨⋅,⋅⟩ and
the induced Euclidean norm ∥⋅∥.
Let θ:X→]−∞,+∞] be convex, lower semicontinuous, and
proper111See [42], [9],
[37], and
[43]
for
background material in convex analysis from which we adopt our
notation which is standard.
We also set \mathbb{R}_{++}:=\{{x\in\mathbb{R}}~{}\big{|}~{}{x>0}\}..
The Moreau envelope with parameter γ∈R++ is the
function
[TABLE]
Moreau only considered the case in which γ=1;
the systematic study involving the parameter γ originated
with
Attouch (see [3] and [4]).
If θ=ιC, the indicator function of a
nonempty closed convex subset C of X,
then the corresponding Moreau envelope with parameter γ is
2γ1dC2, where dC is the distance function
of the set C.
While the indicator function has (effective) domain C and
is differentiable
only on intC, the interior of C, the Moreau envelope
is much better behaved:
for instance, it has full domain and is
differentiable everywhere.
Now assume that
[TABLE]
The Bregman distance222Note that Df is
not a distance in the sense of metric topology; however,
this naming convention is now ubiquitous. associated with f,
first explored by Bregman in [19] (see also
[26]), is
[TABLE]
It serves as a measure of discrepancy between two points and thus
gives rise to associated projectors (nearest-point mappings)
and proximal mappings
which have been employed to solve convex feasibility and
optimization
problems algorithmically; see, e.g.,
[2],
[5],
[7],
[8],
[10],
[11],
[12],
[20],
[21],
[22],
[23],
[24],
[25],
[26],
[27],
[28],
[29],
[31],
[33],
[34],
[35],
[40],
[41],
and
[44].
The classical case arises when f=21∥⋅∥2 in
which case Df(x,y)=21∥x−y∥2=Df(y,x).
This clearly suggests replacing the quadratic term in
(2) by the Bregman distance.
However, because different assignments of f may allow for
cases in which
Df(x,y)=Df(y,x), we actually are led to consider
two envelopes:
the left and rightBregman–Moreau envelopes are defined by
[TABLE]
and
[TABLE]
respectively.
It follows from the definition (see also Example 2.3 below)
that if f=21∥⋅∥2, then Df:(x,y)↦21∥x−y∥2, and
\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}=\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}=\theta\mbox{\small,\square,}(\frac{1}{2\gamma}\|\cdot\|^{2})
is the classical Moreau envelope of θ of parameter γ;
see [38], [39], and also [9, Section 12.4] and [43, Section 1.G].
When γ=1,
we simply write envθ for envθ1,
and envθ for envθ1, which were introduced in [10].
Bregman–Moreau envelopes when γ=1 were
previously explored
in [28] and [33] for the left variant; the authors
provided asymptotic results when γ↓0.
The goal of this paper is to present a systematic study of
regularization aspects of the Bregman–Moreau envelope.
Our results extend and complement several classical results
and provide a novel
way to approximate θ.
We also obtain new results on the asymptotic behaviour
when γ↑+∞
and on the right Bregman–Moreau envelope.
This opens the door to regularization and smoothing
of functions by employing the right Bregman–Moreau envelope. We
also provide visualizations and examples.
The remainder of this paper is organized as follows.
In Section 2, we collect various useful properties and
characterizations of Bregman–Moreau envelopes.
In particular,
the minimizers of the envelopes are
also minimizers of the original function
(see Theorem 2.20).
Section 3 is devoted to the asymptotic behaviour of
the Bregman–Moreau envelopes when
γ↓0 (Theorem 3.3) and when
γ↑+∞ (Theorem 3.5).
Finally, Section 4 provides examples and comments on
future work.
2 Basic properties
In this section, we collect various useful properties of the
Bregman–Moreau envelopes.
We start by describing the effect of scaling the function.
Proposition 2.1**.**
Let θ:X→]−∞,+∞],
let γ∈R++, and let μ∈R++.
Then envγθμ=γenvθγμ
and envγθμ=γenvθγμ.
Proof.
This is analogous to the proof of [9, Proposition 12.22(i)].
∎
We now turn to regularization properties.
(For a variant of Proposition 2.2(i),
see [33, Theorem 2.2 and Proposition 2.1(i)].)
Proposition 2.2**.**
Let θ:X→]−∞,+∞] be such that U∩domθ=∅ and let γ∈R++.
Then the following hold:
(i)
domenvθγ=U,
and (∀y∈U)(∀μ∈]γ,+∞[)infθ(X)≤envθμ(y)≤envθγ(y)≤θ(y).
Consequently, infθ(X)≤infenvθγ(X)≤infθ(U),
and envθγ(y)↓infθ(X) as γ↑+∞.
2. (ii)
domenvθγ=domf,
and (∀x∈U)(∀μ∈]γ,+∞[)infθ(X)≤envθμ(x)≤envθγ(x)≤θ(x).
Consequently, infθ(X)≤infenvθγ(X)≤infθ(U),
and envθγ(x)↓infθ(X) as γ↑+∞.
Proof.
(i): We first show that domenvθγ=U.
Let y∈domenvθγ.
Then \,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\inf_{x\in X}\big{(}\theta(x)+\frac{1}{\gamma}D_{f}(x,y)\big{)}<+\infty,
and hence there exists x∈X such that θ(x)+γ1Df(x,y)<+∞.
Since θ(x)>−∞, this yields y∈U.
From now on, let y∈U, and pick u∈domθ∩U.
Then −f(y)<+∞, ∥∇f(y)∥<+∞, f(u)<+∞, θ(u)<+∞, and
[TABLE]
which gives y∈domenvθγ. Hence, domenvθγ=U.
Next, let μ∈]γ,+∞[.
Then μ1<γ1,
θ≤θ+μ1Df(⋅,y)≤θ+γ1Df(⋅,y), and so
[TABLE]
Therefore,
[TABLE]
Taking now the infimum over y∈U yields
infθ(X)≤infenvθγ(X)≤infθ(U).
Consequently,
[TABLE]
On the other hand, (∀x∈X)envθγ(y)≤θ(x)+γ1Df(x,y),
which implies that (∀x∈X)limγ→+∞envθγ(y)≤θ(x) and thus
limγ→+∞envθγ(y)≤infθ(X).
Altogether,
limγ→+∞envθγ(y)=infθ(X) and the conclusion follows from
(9).
Denote by Γ0(X) the set of all proper lower semicontinuous convex functions from X to ]−∞,+∞].
From now on, we strengthen our assumptions by requiring that
[TABLE]
This will allow us to obtain a quite satisfying theory in which
the envelopes are convex functions.
Note that f is essentially smooth and essentially strictly convex in the sense of [42, Section 26].
It is well known that
[TABLE]
We will also work with the following standard assumptions:
A1
∇2f exists and is continuous on U;
A2
Df is jointly convex, i.e., convex on X×X;
A3
(∀x∈U)Df(x,⋅) is strictly convex on U;
A4
(∀x∈U)Df(x,⋅) is coercive, i.e.,
Df(x,y)→+∞ as ∥y∥→+∞.
Assumptions (11) and
A1–A4 hold in the following cases,
where x=(ξj)1≤j≤J and y=(ηj)1≤j≤J are
two generic points in X=RJ.
(i)
Energy:*
If f:x↦21∥x∥2, then U=X and*
[TABLE]
2. (ii)
Boltzmann–Shannon333When dealing with the Boltzmann–Shannon entropy and Fermi–Dirac
entropy, it is understood that
0⋅ln(0):=0.
For two vectors x and y in X,
expressions
such as x≤y, x⋅y, and x/y
are interpreted coordinate-wise. entropy:*
If f:x↦j=1∑Jξjln(ξj)−ξj,
then U=\{{x\in X}~{}\big{|}~{}{x>0}\}
and one obtains the Kullback–Leibler divergence*
[TABLE]
3. (iii)
Fermi–Dirac entropy:*
If f:x↦j=1∑Jξjln(ξj)+(1−ξj)ln(1−ξj), then U=\{{x\in X}~{}\big{|}~{}{0<x<1}\} and*
[TABLE]
The following result relates the Bregman–Moreau envelopes to
Fenchel conjugates.
Proposition 2.4**.**
Let θ:X→]−∞,+∞]
be such that U∩domθ=∅
and let γ∈R++.
Then the following hold444
Indeed, the proof does not require any of
A1–A4.
:
(i)
γenvθγ∘∇f∗=f∗−(γθ+f)∗.
2. (ii)
γenvθγ=f−(f∗+(γθ∘∇f))∗.
Proof.
(i): This follows from [33, Theorem 2.4] and (12).
Note also that the case in which γ=1 is related to [30, Theorem 1(i)] applied to (f∗,θ∗) instead of (f,θ).
(ii):
Let x∈X.
Using the fact that f∗(∇f(y))=⟨∇f(y),y⟩−f(y) (see, e.g., [42, Theorem 23.5])
and that \big{(}\nabla f\big{)}^{-1}=\nabla f^{*} (see (12)), we obtain
[TABLE]
This completes the proof.
∎
In what follows, we shall require the following two facts.
Fact 2.5**.**
The following hold:
(i)
(∀x∈X)(∀y∈U)Df(x,y)=0⇔x=y.
2. (ii)
(∀y∈U)* Df(⋅,y) is coercive,
i.e., Df(x,y)→+∞ as ∥x∥→+∞.*
Proof.
(i): See [6, Theorem 3.7.(iv)].
(ii): See [6, Theorem 3.7.(iii)].
∎
Fact 2.6**.**
Let θ∈Γ0(X) be such that domθ∩U=∅ and let γ∈R++.
Consider the following properties:
(a)
U∩domθ* is bounded.*
(b)
infθ(U)>−∞.
(c)
f* is supercoercive, i.e.,
f(x)/∥x∥→+∞ as ∥x∥→+∞.*
(d)
(∀x∈U)* Df(x,⋅) is supercoercive.*
Then the following hold:
(i)
If any of the conditions (a), (b), or (c) holds, then
[TABLE]
or, equivalently,
[TABLE]
2. (ii)
If any of the conditions (a), (b), or (d) holds, then
[TABLE]
Proof.
Since γ1Df=Dγ1f,
the result follows from [10, Lemma 2.12]
applied to γ1f.
∎
The definition of proximal mappings relies on the following
result.
(For variants of Proposition 2.7(i),
see [33, Theorems 2.2 and 4.3].)
Proposition 2.7**.**
Let θ:X→]−∞,+∞] be convex
and such that U∩domθ=∅, and let γ∈R++.
Then the following hold:
(i)
envθγ* is convex and continuous on U, and*
(a)
if (18) holds, i.e., (∀y∈U)θ(⋅)+γ1Df(⋅,y) is coercive,
then envθγ is proper;
2. (b)
if θ∈Γ0(X) and θ(⋅)+γ1Df(⋅,y) is coercive for a given y∈U,
then there exists a unique point z∈U such that envθγ(y)=θ(z)+γ1Df(z,y).
2. (ii)
envθγ* is convex and continuous on U, and*
(a)
if (20) holds, i.e., (∀x∈U)θ(⋅)+γ1Df(x,⋅) is coercive,
then envθγ is proper;
2. (b)
if θ∈Γ0(X) and θ(⋅)+γ1Df(x,⋅) is coercive for a given x∈U,
then there exists a unique point z∈U such that envθγ(z)=θ(z)+γ1Df(x,z).
Proof.
Since γ1Df=Dγ1f,
the result follows from
[10, Propositions 3.4 and 3.5]
applied to γ1f.
∎
In view of Proposition 2.7, we define the following operators on U;
see also [10, Definition 3.7].
Definition 2.8** (Bregman proximity operators).**
Let θ∈Γ0(X) be such that
U∩domθ=∅.
If (18) holds for γ=1, then
the left proximity operator associated with θ is
[TABLE]
If (20) holds for γ=1, then
the right proximity operator associated with θ is
[TABLE]
Remark 2.9**.**
Suppose that f=21∥⋅∥2 and let θ∈Γ0(X).
Then U=intdomf=X and hence
U∩domθ=domθ=∅.
Since f(x)/∥x∥=21∥x∥→+∞ as ∥x∥→+∞, 2.6 implies that
(18) and (20) hold for all γ∈R++.
In this case, Df:(x,y)↦21∥x−y∥2
and Pθ=Pθ=Proxθ is the classical Moreau proximity operator of θ [38].
Given a closed convex subset C of X with C∩U=∅, we have that ιC∈Γ0(X), domιC=C,
and hence U∩domιC=U∩C=∅ and also infιC(U)=0>−∞,
which together with 2.6 imply that (18) and (20) hold for all γ∈R++.
This leads to the following definition.
Definition 2.10** (Bregman projectors).**
Let C be a closed convex subset of X such that
U∩C=∅.
Then PC:=PιC is the left Bregman projector onto C
and PC:=PιC is the right Bregman projector onto C.
Remark 2.11**.**
In view of Remark 2.9, if f=21∥⋅∥2,
then PC=PC=PC is the orthogonal projector onto C.
Note that PC, PC, and PC are not, in general, the same when f=21∥⋅∥2.
Before we give a corresponding example, let us
show that these projectors are the same when X=R.
Proposition 2.12**.**
Suppose that X=R and let C be a closed convex subset of R
such that U∩C=∅.
Then PC=PC=PC on U.
Proof.
Let y∈U.
Because X=R, (∀z∈C)(∃λz∈[0,1])PCy=λzz+(1−λz)y.
Since Df(⋅,y) is convex, nonnegative, and Df(y,y)=0,
it follows that
[TABLE]
This combined with Definition 2.10 yields
PC(y)=PC(y).
The proof that PC=PC is similar.
∎
Example 2.13**.**
Here we illustrate how Bregman projectors may differ from the orthogonal
projector. We adapt [6, Example 6.15], which illustrates
the setting in which f is an entropy function on RJ and C is
the “probabilistic hyperplane” \{{x\in\mathbb{R}^{J}}~{}\big{|}~{}{\sum_{j}\xi_{j}=1}\}.
For simplicity, we work in X=R2.
Suppose that f1 is the energy from
Example 2.3(i) while
f2 is the negative Boltzmann–Shannon entropy
from Example 2.3(ii).
Since we work in R2, the probabilistic hyperplane
is described by ξ2=1−ξ1.
We compute PC(1,0) by substituting
η1=1,η2=0,ξ2=1−ξ1 and minimizing the resulting
Bregman distance over ξ1. We obtain
(i)
PC(1,2)=(0,1)* for
Df1,*
2. (ii)
PC(1,2)=(1/3,2/3)* for
Df2.*
We illustrate this in Figure 1. For
i∈{1,2},
we sketch the contour plot of Dfi(⋅,(1,2)) for
the level given by Dfi(PC(1,2),(1,2)) together with the set C.
Remark 2.14**.**
Let θ∈Γ0(X) be such that
U∩domθ=∅ and let γ∈R++.
Proposition 2.1 implies that
envγθ=γenvθγ
and envγθ=γenvθγ.
We thus derive from the definition that if (18) holds, then
The next result provides information on the proximal mapping when
the parameter is varied.
For a variant of the last inequality in (26),
see [33, Proposition 2.1(ii)].
Proposition 2.15**.**
Let θ∈Γ0(X) be such that
U∩domθ=∅ and let γ∈R++.
(i)
If (18) holds, then
(∀y∈U)(∀μ∈]γ,+∞[)
[TABLE]
2. (ii)
If (20) holds, then (∀x∈U)(∀μ∈]γ,+∞[)
[TABLE]
Proof.
This follows from Remark 2.14 and [36, Proposition 7.6.1].
∎
The left and right proximal mappings can be characterized in
various ways:
Proposition 2.16**.**
Let θ∈Γ0(X) be such that domθ∩U=∅ and let γ∈R++.
(i)
Suppose that (18) holds.
Then for every (x,y)∈U×U,
the following conditions are equivalent:
(a)
x=Pγθ(y),
2. (b)
0∈γ∂θ(x)+∇f(x)−∇f(y),
3. (c)
(∀z∈X)⟨∇f(y)−∇f(x),z−x⟩+γθ(x)≤γθ(z).
Moreover,
[TABLE]
is continuous on U.
2. (ii)
Suppose that (20) holds.
Then for every (x,y)∈U×U,
the following conditions are equivalent:
In the case of item (i) and when U∗=X, we note that,
by (12),
[TABLE]
see also **[33, Theorem 4.2]** for a more general result.
2. (ii)
In the case of item (ii), let us prove
the variant of **[33, Theorem 4.1]**
stating that
[TABLE]
Indeed, let x1 and x2 be in U,
and set yi=Pγθ(xi) for i∈{1,2}.
Then
θ(y1)+γ1Df(x1,y1)≤θ(y2)+γ1Df(x1,y2) and
θ(y2)+γ1Df(x2,y2)≤θ(y1)+γ1Df(x2,y1).
Adding and simplifying yields
[TABLE]
A direct expansion (or the four-point identity from
**[11, Remark 2.5]**) shows that
(31) is the same as
[TABLE]
therefore, (30) follows.
We do not know whether or not in general the operator in (30)
is the gradient of a convex function.
Corollary 2.18**.**
Let C be a closed convex subset of X such that
U∩C=∅,
let (x,y)∈U×U,
and let p∈U∩C.
Then the following hold:
The results in this section, almost all of which are new,
extend or complement results for the classical energy case and
for left variants studied in [28] and [33].
We will require the following lemma.
Lemma 3.1**.**
Let C be a compact subset of a Hausdorff space X,
let ϕ:X→[−∞,+∞] be lower
semicontinuous,
let (xa)a∈A be a net in C,
and suppose that ϕ(xa)→infϕ(X).
Then argminϕ=∅
and all cluster points of (xa)a∈A lie in argminϕ.
Consequently, if ϕ attains its minimum at a unique point u, then xa→u.
Proof.
This follows from the lower semicontinuity of ϕ and [9, Lemma 1.14].
∎
What is the behaviour of Bregman–Moreau envelopes and proximity operators when γ↓0?
The next two results provide answers.
Proposition 3.2**.**
Let θ∈Γ0(X) be such that
U∩domθ=∅ and let x,y∈U.
Then the following hold:
(i)
If (18) holds for some μ∈R++ instead of γ,
then Pγθ(y)→y as γ↓0.
2. (ii)
If (20) holds for some μ∈R++ instead of γ,
then Pγθ(x)→x as γ↓0.
Proof.
(i): Noting that (∀γ∈]0,μ])θ+γ1Df(⋅,y)≥θ+μ1Df(⋅,y),
we have that (18) holds for all γ∈]0,μ].
In particular, g:=θ+μ1Df(⋅,y) is coercive.
By Proposition 2.2(i) and (24a),
[TABLE]
and so Pγθ(y)∈lev≤θ(y)g.
The coercivity of g and [9, Proposition 11.12] imply that
ν:=supγ∈]0,μ]∥Pγθ(y)∥<+∞.
Now by [9, Theorem 9.20],
there exist u∈X and η∈R such that
θ≥⟨⋅,u⟩+η.
Using (40) and Cauchy–Schwarz yields
[TABLE]
which gives
[TABLE]
and thus Df(Pγθ(y),y)→0 as γ↓0.
Observing that Df(⋅,y)=f(⋅)−f(y)−⟨∇f(y),⋅−y⟩ is lower semicontinuous,
that argminDf(⋅,y)={y} by 2.5(i),
and that supγ∈]0,1[∥Pγθ(y)∥<+∞,
it follows from Lemma 3.1 that Pγθ(y)→y as γ↓0.
For a variant of the result from Theorem 3.3(i) that
envθγ(y)↑θ(y) as γ↓0, see [33, Theorem 2.5].
Note that Df(Pγθ(y),y) is monotone with respect to γ,
as shown in Proposition 2.15(i),
but the same is not necessarily true for γ1Df(Pγθ(y),y) (see Figure 2).
The two following results describe the
behaviour when γ↑+∞.
Proposition 3.4**.**
Let θ∈Γ0(X) be such that U∩domθ=∅ and let x,y∈U.
Then the following hold:
(i)
If (18) holds for all γ∈R++,
then θ(Pγθ(y))→infθ(X) as γ↑+∞.
2. (ii)
If (20) holds for all γ∈R++, then
θ(Pγθ(x))→infθ(X) as γ↑+∞.
Proof.
We shall just prove (i)
because the proof of (ii) is similar.
Assume that (18) holds for all
γ∈R++.
Combining (24b) with Proposition 2.2(i) yields
[TABLE]
which implies that θ(Pγθ(y))→infθ(X) as γ↑+∞.
∎
Theorem 3.5**.**
Let θ∈Γ0(X) be coercive such that
U∩domθ=∅ and let x,y∈U.
Then the following hold:
(i)
The net (Pγθ(y))γ∈R++ is bounded
with all cluster points as γ↑+∞ lying in argminθ.
Moreover,
(a)
if argminθ is a singleton,
then Pγθ(y)→argminθ as γ↑+∞;
2. (b)
if argminθ⊆U,
then Pγθ(y)→Pargminθy as γ↑+∞.
2. (ii)
The net (Pγθ(x))γ∈R++ is bounded
with all cluster points as γ↑+∞ lying in argminθ.
Moreover,
(a)
if argminθ is a singleton,
then Pγθ(x)→argminθ as γ↑+∞;
2. (b)
if argminθ⊆U,
then Pγθ(x)→Pargminθx as γ↑+∞.
Proof.
First, by assumption, [9, Proposition 11.15(i)] gives argminθ=∅.
This combined with [9, Lemma 1.24 and Corollary 8.5] implies that
argminθ=lev≤infθ(X)θ is a nonempty closed convex subset of X.
Now since θ is coercive and since Df≥0, we immediately get that
(18) and (20) hold for all γ∈R++.
and then
from the coercivity of θ and [9, Proposition 11.12] that (Pγθ(y))γ∈R++ is bounded.
In turn, Proposition 3.4(i) and Lemma 3.1 imply that
all cluster points of (Pγθ(y))γ∈R++ as γ↑+∞ lie in argminθ,
and we get (i)(a).
Now assume that argminθ⊆U.
Let y′ be a cluster point of (Pγθ(y))γ∈R++ as γ↑+∞.
Then y′∈argminθ⊆U and there exists a sequence (γn)n∈N in R++ such that
γn↑+∞ and Pγnθ(y)→y′ as n→+∞.
Let z∈argminθ.
We have (∀n∈N)θ(z)≤θ(Pγnθ(y)),
and by Proposition 2.16(i),
[TABLE]
Taking the limit as n→+∞ and using the continuity of ∇f yield
[TABLE]
Since z∈argminθ was chosen arbitrarily
and since argminθ is a closed convex subset of X with
U∩argminθ=∅,
in view of Corollary 2.18(i), y′=Pargminθy,
and Pargminθy is thus the only cluster point of (Pγθ(y))γ∈R++
as γ↑+∞.
Hence, (i)(b) holds.
In this final section, we illustrate our theory by considering the
case in which θ is the nonsmooth function
x↦∣x−21∣.
Example 4.1**.**
Suppose that X=R and let θ:R→R:x↦∣x−21∣.
Then θ∈Γ0(X), domθ=X, and θ is coercive with argminθ={21}.
It follows that U∩domθ=U=∅ and,
by 2.6,
the assumptions (18) and (20) hold for all γ∈R++.
We revisit Example 2.3 (with J=1)
to illustrate Theorem 2.20, Proposition 3.2, and Theorem 3.5.
Let γ∈R++.
We recall from Proposition 2.16 that
[TABLE]
and that
[TABLE]
Note that
[TABLE]
(i)
Energy:*
Suppose that f is the energy.
Then U=intdomf=R.
Since ∇f=Id,
by Remark 2.9 and (50),
Pγθ=Pγθ=(Id+γ∂θ)−1.
We have that*
[TABLE]
Then (∇f+γ∂θ)−1(y) amounts to solving (∇f+γ∂θ)(x)=y piecewise.
For example, solving x−γ=y for x<21 yields x=y+γ for y+γ<21,
so (∇f+γθ)−1(y)=y+γ for y<21−γ.
Continuing in this fashion,
[TABLE]
and by (24a),
[TABLE]
It is clear that Pγθ(21)=21,
while (∀y∈R∖{21})Pγθ(y)=y,
and so FixPγθ={21}=argminθ.
As expected, Pγθ(y)→y as γ↓0,
and Pγθ(y)→21=argminθ as γ↑+∞.
Moreover, envθγ(y)→θ(y) as
θ↓0;
this is illustrated in Figure 3.
(ii)
Boltzmann–Shannon entropy:*
Suppose that f is the Boltzmann–Shannon entropy.
Then domf=R+, U=intdomf=R++,
∇f(x)=lnx, and ∇2f(x)=1/x.
Again employing (50)
and (24a), we have*
[TABLE]
Clearly FixPγθ={21}=argminθ.
It can also be seen that Pγθ(y)→y as γ↓0,
and Pγθ(y)→21=argminθ as γ↑+∞.
Moreover, once again envθγ(y)→θ(y) as
θ↓0.
This example is illustrated in Figure 4.
Now (51) implies that for every (x,y)∈R++×R++,
[TABLE]
Solving the induced system of equations yields
[TABLE]
Using (25a) and noting that 1−γx<21 if x<21−γ
and 1+γx>21 if x>21+γ, we obtain
[TABLE]
The right envelope is shown in Figure 5.
(iii)
Fermi–Dirac entropy:*
Suppose that f is the Fermi–Dirac entropy.
Then domf=[0,1], U=intdomf=]0,1[,
∇f(x)=ln(1−xx), and ∇2f(x)=x(1−x)1.
Again by (50),*
[TABLE]
A formula for envθγ may be once again obtained
by using (24a):
[TABLE]
We illustrate this envelope in Figure 6.
Next we have from (51) that for every (x,y)∈]0,1[×]0,1[,
[TABLE]
Solving the induced system of equations gives
[TABLE]
and, in turn, (25a) gives
[TABLE]
The right envelope is shown in Figure 7.
We conclude this section with some remarks concerning the
minimization of the (nonlinear) functional
[TABLE]
where τ is a convex, lower semicontinuous, and proper, and
subject to finitely many constraints
[TABLE]
and where ak∈L∞[0,1] and ρ was used to generate the
(consistent) data: ⟨ak,ρ⟩=bk.
Under appropriate assumptions (for details, see [13], [14], [15, Section 7], [16, Theorem 6.3.4], [17], [18, Section 4.7]),
recovering the (primal) solution x=xτ amounts to first obtaining a
dual solution by solving the finite system of nonlinear equations
[TABLE]
followed by computing
[TABLE]
As a numerical illustration, we assume that
τ=envθγ is the left Bregman envelope from
Example 4.1(iii) and ρ is the step function
pictured in Figure 8 (right). We clearly observe the influence of the parameter γ; smaller values of γ lead to primal solutions that are closer to the step function that was used to generate the data. Were a different, smoother ρ to be used, a larger value of γ might be more appropriate. The reason for the varying extent to which the primal solutions for different choices of γ resemble step functions may be gleaned from Figure 8 (left), where (τ∗)′—upon which the primal solution (68) depends—is shown.
Similarly attempting to compute with τ as the right envelope from Example 4.1(iii), we are unable to symbolically invert the gradient. We leave the numerical attempt at inversion as future work.
Acknowledgments
HHB was partially supported by the Natural Sciences and
Engineering Research Council of Canada.
MND was partially supported by the Australian Research Council Discovery Project DP160101537.
Bibliography44
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1]
2[2] F. Alvarez, R. Correa, and M. Marechal, Regular self-proximal distances are Bregman, Journal of Convex Analysis 24 (2017), 135–148.
3[3] H. Attouch, Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés, Comptes Rendus de l’Académie des Sciences de Paris 284 (1977), 539–542.
4[4] H. Attouch, Variational Convergence for Functions and Operators , Pitman, 1984.
5[5] H.H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Mathematics of Operations Research 42 (2016), 330–348.
6[6] H.H. Bauschke and J.M. Borwein, Legendre functions and the method of random Bregman projections, Journal of Convex Analysis 4 (1997), 27–67.
7[7] H.H. Bauschke, J.M. Borwein, and P.L. Combettes, Bregman monotone optimization algorithms, SIAM Journal on Control and Optimization 42 (2003), 596–636.
8[8] H.H. Bauschke and P.L. Combettes, Iterating Bregman retractions, SIAM Journal on Optimization 13 (2003), 1159–1173.