Regularizing with Bregman-Moreau envelopes

Heinz H. Bauschke; Minh N. Dao; Scott B. Lindstrom

arXiv:1705.06019·math.FA·April 14, 2020·SIAM J. Optim.

Regularizing with Bregman-Moreau envelopes

Heinz H. Bauschke, Minh N. Dao, Scott B. Lindstrom

PDF

TL;DR

This paper analyzes Bregman-Moreau envelopes, extending previous work by exploring both left and right variants and providing new asymptotic results, with applications in convex and nonconvex optimization.

Contribution

It offers a comprehensive analysis of both left and right Bregman-Moreau envelopes, including new asymptotic properties, expanding the theoretical understanding of these regularization tools.

Findings

01

Extended analysis of Bregman-Moreau envelopes for convex and nonconvex functions.

02

Derived new asymptotic properties of the envelopes.

03

Provided multiple illustrative examples.

Abstract

Moreau's seminal paper, introducing what is now called the Moreau envelope and the proximity operator (also known as the proximal mapping), appeared in 1965. The Moreau envelope of a given convex function provides a regularized version which has additional desirable properties such as differentiability and full domain. Fifty years ago, Attouch proposed using the Moreau envelope for regularization. Since then, this branch of convex analysis has developed in many fruitful directions. In 1967, Bregman introduced what is nowadays known as the Bregman distance as a measure of discrepancy between two points generalizing the square of the Euclidean distance. Proximity operators based on the Bregman distance have become a topic of significant research as they are useful in the algorithmic solution of optimization problems. More recently, in 2012, Kan and Song studied regularization aspects of…

Equations147

X := R^{J}

X := R^{J}

{\operatorname{env}}_{\theta}^{\gamma}\colon x\mapsto\inf_{y\in X}\big{(}\theta(y)+\frac{1}{2\gamma}\|x-y\|^{2}\big{)}.

{\operatorname{env}}_{\theta}^{\gamma}\colon x\mapsto\inf_{y\in X}\big{(}\theta(y)+\frac{1}{2\gamma}\|x-y\|^{2}\big{)}.

f : X \to] - \infty, + \infty]

f : X \to] - \infty, + \infty]

D_{f} : X \times X \to [0, + \infty] : (x, y) \mapsto {f (x) - f (y) - ⟨ \nabla f (y), x - y ⟩, + \infty, if y \in U; otherwise .

D_{f} : X \times X \to [0, + \infty] : (x, y) \mapsto {f (x) - f (y) - ⟨ \nabla f (y), x - y ⟩, + \infty, if y \in U; otherwise .

\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}\colon X\to\left[-\infty,+\infty\right]\colon y\mapsto\inf_{x\in X}\big{(}\theta(x)+\frac{1}{\gamma}D_{f}(x,y)\big{)}

\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}\colon X\to\left[-\infty,+\infty\right]\colon y\mapsto\inf_{x\in X}\big{(}\theta(x)+\frac{1}{\gamma}D_{f}(x,y)\big{)}

\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}\colon X\to\left[-\infty,+\infty\right]\colon x\mapsto\inf_{y\in X}\big{(}\theta(y)+\frac{1}{\gamma}D_{f}(x,y)\big{)},

\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}\colon X\to\left[-\infty,+\infty\right]\colon x\mapsto\inf_{y\in X}\big{(}\theta(y)+\frac{1}{\gamma}D_{f}(x,y)\big{)},

env_{θ}^{γ} (y)

env_{θ}^{γ} (y)

\displaystyle\leq\theta(u)+\frac{1}{\gamma}\big{(}f(u)-f(y)-\left\langle{\nabla f(y)},{u-y}\right\rangle\big{)}<+\infty,

\inf_{x\in X}\theta(x)\leq\,\overleftarrow{\operatorname{env}}_{\theta}^{\mu}(y)\leq\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\inf_{x\in X}\big{(}\theta(x)+\frac{1}{\gamma}D_{f}(x,y)\big{)}\leq\theta(y)+\frac{1}{\gamma}D_{f}(y,y)=\theta(y).

\inf_{x\in X}\theta(x)\leq\,\overleftarrow{\operatorname{env}}_{\theta}^{\mu}(y)\leq\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\inf_{x\in X}\big{(}\theta(x)+\frac{1}{\gamma}D_{f}(x,y)\big{)}\leq\theta(y)+\frac{1}{\gamma}D_{f}(y,y)=\theta(y).

in f θ (X) \leq env_{θ}^{μ} (y) \leq env_{θ}^{γ} (y) \leq θ (y) .

in f θ (X) \leq env_{θ}^{μ} (y) \leq env_{θ}^{γ} (y) \leq θ (y) .

in f θ (X) \leq γ \to + \infty \underline{lim} env_{θ}^{γ} (y) .

in f θ (X) \leq γ \to + \infty \underline{lim} env_{θ}^{γ} (y) .

f \in Γ_{0} (X)

f \in Γ_{0} (X)

\nabla f\colon U\to U^{*}:=\operatorname{int}\operatorname{dom}\,f^{*}\text{~{}is a homeomorphism with~{}}\big{(}\nabla f\big{)}^{-1}=\nabla f^{*}.

\nabla f\colon U\to U^{*}:=\operatorname{int}\operatorname{dom}\,f^{*}\text{~{}is a homeomorphism with~{}}\big{(}\nabla f\big{)}^{-1}=\nabla f^{*}.

D_{f} (x, y) = \frac{1}{2} ∥ x - y ∥^{2} .

D_{f} (x, y) = \frac{1}{2} ∥ x - y ∥^{2} .

D_{f} (x, y) = {\sum_{j = 1}^{J} ξ_{j} ln (ξ_{j} / η_{j}) - ξ_{j} + η_{j}, + \infty, if x \geq 0 and y > 0; otherwise.

D_{f} (x, y) = {\sum_{j = 1}^{J} ξ_{j} ln (ξ_{j} / η_{j}) - ξ_{j} + η_{j}, + \infty, if x \geq 0 and y > 0; otherwise.

D_{f}(x,y)=\begin{cases}\textstyle\sum_{j=1}^{J}\xi_{j}\ln(\xi_{j}/\eta_{j})+(1-\xi_{j})\ln\big{(}(1-\xi_{j})/(1-\eta_{j})\big{)},&\text{if $0\leq x\leq 1$ and $0<y<1$;}\\ +\infty,&\text{otherwise.}\end{cases}

D_{f}(x,y)=\begin{cases}\textstyle\sum_{j=1}^{J}\xi_{j}\ln(\xi_{j}/\eta_{j})+(1-\xi_{j})\ln\big{(}(1-\xi_{j})/(1-\eta_{j})\big{)},&\text{if $0\leq x\leq 1$ and $0<y<1$;}\\ +\infty,&\text{otherwise.}\end{cases}

env_{θ}^{γ} (x)

env_{θ}^{γ} (x)

\displaystyle=\frac{f(x)}{\gamma}+\frac{1}{\gamma}\inf_{y\in U}\big{(}\gamma\theta(y)+f^{*}(\nabla f(y))-\left\langle{\nabla f(y)},{x}\right\rangle\big{)}

\displaystyle=\frac{f(x)}{\gamma}+\frac{1}{\gamma}\inf_{y^{*}\in U^{*}}\big{(}\gamma\theta(\nabla f^{*}(y^{*}))+f^{*}(y^{*})-\left\langle{y^{*}},{x}\right\rangle\big{)}

\displaystyle=\frac{f(x)}{\gamma}-\frac{1}{\gamma}\sup_{y^{*}\in X}\big{(}\left\langle{x},{y^{*}}\right\rangle-\big{(}(\gamma\theta\circ\nabla f^{*})+f^{*}\big{)}(y^{*})\big{)}

\displaystyle=\frac{f(x)}{\gamma}-\frac{1}{\gamma}\big{(}(\gamma\theta\circ\nabla f^{*})+f^{*}\big{)}^{*}(x).

(\forall y \in U) θ (\cdot) + \frac{1}{γ} D_{f} (\cdot, y) is coercive

(\forall y \in U) θ (\cdot) + \frac{1}{γ} D_{f} (\cdot, y) is coercive

\frac{1}{γ} ran \nabla f \subseteq int dom (\frac{1}{γ} f + θ)^{*} .

\frac{1}{γ} ran \nabla f \subseteq int dom (\frac{1}{γ} f + θ)^{*} .

(\forall x \in U) θ (\cdot) + \frac{1}{γ} D_{f} (x, \cdot) is coercive.

(\forall x \in U) θ (\cdot) + \frac{1}{γ} D_{f} (x, \cdot) is coercive.

\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\theta}\colon U\to U\colon y\mapsto\underset{x\in X}{\operatorname{argmin}}\;\>\big{(}\theta(x)+D_{f}(x,y)\big{)}.

\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\theta}\colon U\to U\colon y\mapsto\underset{x\in X}{\operatorname{argmin}}\;\>\big{(}\theta(x)+D_{f}(x,y)\big{)}.

\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\theta}\colon U\to U\colon x\mapsto\underset{y\in X}{\operatorname{argmin}}\;\>\big{(}\theta(y)+D_{f}(x,y)\big{)}.

\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\theta}\colon U\to U\colon x\mapsto\underset{y\in X}{\operatorname{argmin}}\;\>\big{(}\theta(y)+D_{f}(x,y)\big{)}.

(\forall z \in C) D_{f} (P_{C} y, y) \leq λ_{z} D_{f} (z, y) + (1 - λ_{z}) D_{f} (y, y) = λ_{z} D_{f} (z, y) \leq D_{f} (z, y) .

(\forall z \in C) D_{f} (P_{C} y, y) \leq λ_{z} D_{f} (z, y) + (1 - λ_{z}) D_{f} (y, y) = λ_{z} D_{f} (z, y) \leq D_{f} (z, y) .

\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\big{)}+\frac{1}{\gamma}D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y\big{)}

\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\big{)}+\frac{1}{\gamma}D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y\big{)}

\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\big{)}\leq\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\leq\theta(y).

\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(x)=\theta\big{(}\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\big{)}+\frac{1}{\gamma}D_{f}\big{(}x,\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\big{)}

\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(x)=\theta\big{(}\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\big{)}+\frac{1}{\gamma}D_{f}\big{(}x,\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\big{)}

\theta\big{(}\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\big{)}\leq\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(x)\leq\theta(x).

\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(y)\big{)}\leq\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\big{)}\quad\text{and}\quad D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(y),y\big{)}\geq D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y\big{)}.

\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(y)\big{)}\leq\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\big{)}\quad\text{and}\quad D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(y),y\big{)}\geq D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y\big{)}.

\theta\big{(}\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(x)\big{)}\leq\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\big{)}\quad\text{and}\quad D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(x),x\big{)}\geq D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x),x\big{)}.

\theta\big{(}\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(x)\big{)}\leq\theta\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\big{)}\quad\text{and}\quad D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\mu\theta}(x),x\big{)}\geq D_{f}\big{(}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x),x\big{)}.

P_{γ θ} = (\nabla f + γ \partial θ)^{- 1} \circ \nabla f

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Regularizing with Bregman–Moreau envelopes

Heinz H. Bauschke, Minh N. Dao and Scott B. Lindstrom

Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: [email protected]. CARMA, University of Newcastle, Callaghan, NSW 2308, Australia. E-mail: [email protected]. CARMA, University of Newcastle, Callaghan, NSW 2308, Australia. E-mail: [email protected].

(November 16, 2018)

Abstract

Moreau’s seminal paper, introducing what is now called the Moreau envelope and the proximity operator (also known as the proximal mapping), appeared in 1962. The Moreau envelope of a given convex function provides a regularized version which has additional desirable properties such as differentiability and full domain. Forty years ago, Attouch proposed using the Moreau envelope for regularization. Since then, this branch of convex analysis has developed in many fruitful directions. In 1967, Bregman introduced what is nowadays known as the Bregman distance as a measure of discrepancy between two points generalizing the square of the Euclidean distance. Proximity operators based on the Bregman distance have become a topic of significant research as they are useful in the algorithmic solution of optimization problems. More recently, in 2012, Kan and Song studied regularization aspects of the left Bregman–Moreau envelope even for nonconvex functions. In this paper, we complement previous works by analyzing the left and right Bregman–Moreau envelopes and by providing additional asymptotic results. Several examples are provided.

2010 Mathematics Subject Classification: Primary 90C25; Secondary 26A51, 26B25, 47H05, 47H09.

Keywords: Bregman distance, convex function, Moreau envelope, proximal mapping, proximity operator, regularization.

1 Introduction

We assume throughout that

[TABLE]

which we equip with the standard inner product $\left\langle{\cdot},{\cdot}\right\rangle$ and the induced Euclidean norm $\|\cdot\|$ .

Let $\theta\colon X\to\left]-\infty,+\infty\right]$ be convex, lower semicontinuous, and proper111See [42], [9], [37], and [43] for background material in convex analysis from which we adopt our notation which is standard. We also set $\mathbb{R}_{++}:=\{{x\in\mathbb{R}}~{}\big{|}~{}{x>0}\}$ .. The Moreau envelope with parameter $\gamma\in\mathbb{R}_{++}$ is the function

[TABLE]

Moreau only considered the case in which $\gamma=1$ ; the systematic study involving the parameter $\gamma$ originated with Attouch (see [3] and [4]). If $\theta=\iota_{C}$ , the indicator function of a nonempty closed convex subset $C$ of $X$ , then the corresponding Moreau envelope with parameter $\gamma$ is $\tfrac{1}{2\gamma}d_{C}^{2}$ , where $d_{C}$ is the distance function of the set $C$ . While the indicator function has (effective) domain $C$ and is differentiable only on $\operatorname{int}C$ , the interior of $C$ , the Moreau envelope is much better behaved: for instance, it has full domain and is differentiable everywhere.

Now assume that

[TABLE]

The Bregman distance222Note that $D_{f}$ is not a distance in the sense of metric topology; however, this naming convention is now ubiquitous. associated with $f$ , first explored by Bregman in [19] (see also [26]), is

[TABLE]

It serves as a measure of discrepancy between two points and thus gives rise to associated projectors (nearest-point mappings) and proximal mappings which have been employed to solve convex feasibility and optimization problems algorithmically; see, e.g., [2], [5], [7], [8], [10], [11], [12], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [31], [33], [34], [35], [40], [41], and [44]. The classical case arises when $f=\tfrac{1}{2}\|\cdot\|^{2}$ in which case $D_{f}(x,y)=\tfrac{1}{2}\|x-y\|^{2}=D_{f}(y,x)$ . This clearly suggests replacing the quadratic term in (2) by the Bregman distance. However, because different assignments of $f$ may allow for cases in which $D_{f}(x,y)\neq D_{f}(y,x)$ , we actually are led to consider two envelopes: the left and right Bregman–Moreau envelopes are defined by

[TABLE]

and

[TABLE]

respectively. It follows from the definition (see also Example 2.3 below) that if $f=\frac{1}{2}\|\cdot\|^{2}$ , then $D_{f}\colon(x,y)\mapsto\frac{1}{2}\|x-y\|^{2}$ , and $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}=\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}=\theta\mbox{\small$ ,\square, $}(\frac{1}{2\gamma}\|\cdot\|^{2})$ is the classical Moreau envelope of $\theta$ of parameter $\gamma$ ; see [38], [39], and also [9, Section 12.4] and [43, Section 1.G]. When $\gamma=1$ , we simply write $\,\overleftarrow{\operatorname{env}}_{\theta}$ for $\,\overleftarrow{\operatorname{env}}_{\theta}^{1}$ , and $\,\overrightarrow{\operatorname{env}}_{\theta}$ for $\,\overrightarrow{\operatorname{env}}_{\theta}^{1}$ , which were introduced in [10]. Bregman–Moreau envelopes when $\gamma\neq 1$ were previously explored in [28] and [33] for the left variant; the authors provided asymptotic results when $\gamma\downarrow 0$ .

The goal of this paper is to present a systematic study of regularization aspects of the Bregman–Moreau envelope. Our results extend and complement several classical results and provide a novel way to approximate $\theta$ . We also obtain new results on the asymptotic behaviour when $\gamma\uparrow+\infty$ and on the right Bregman–Moreau envelope. This opens the door to regularization and smoothing of functions by employing the right Bregman–Moreau envelope. We also provide visualizations and examples.

The remainder of this paper is organized as follows. In Section 2, we collect various useful properties and characterizations of Bregman–Moreau envelopes. In particular, the minimizers of the envelopes are also minimizers of the original function (see Theorem 2.20). Section 3 is devoted to the asymptotic behaviour of the Bregman–Moreau envelopes when $\gamma\downarrow 0$ (Theorem 3.3) and when $\gamma\uparrow+\infty$ (Theorem 3.5). Finally, Section 4 provides examples and comments on future work.

2 Basic properties

In this section, we collect various useful properties of the Bregman–Moreau envelopes.

We start by describing the effect of scaling the function.

Proposition 2.1.

Let $\theta\colon X\to\left]-\infty,+\infty\right]$ , let $\gamma\in\mathbb{R}_{++}$ , and let $\mu\in\mathbb{R}_{++}$ . Then $\,\overleftarrow{\operatorname{env}}_{\gamma\theta}^{\mu}=\gamma\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma\mu}$ and $\,\overrightarrow{\operatorname{env}}_{\gamma\theta}^{\mu}=\gamma\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma\mu}$ .

Proof.

This is analogous to the proof of [9, Proposition 12.22(i)]. ∎

We now turn to regularization properties. (For a variant of Proposition 2.2 (i), see [33, Theorem 2.2 and Proposition 2.1(i)].)

Proposition 2.2.

Let $\theta\colon X\to\left]-\infty,+\infty\right]$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $\gamma\in\mathbb{R}_{++}$ . Then the following hold:

(i)

$\operatorname{dom}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}=U$ , and $(\forall y\in U)(\forall\mu\in\left]\gamma,+\infty\right[)$ $\inf\theta(X)\leq\,\overleftarrow{\operatorname{env}}_{\theta}^{\mu}(y)\leq\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\leq\theta(y)$ . Consequently, $\inf\theta(X)\leq\inf\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(X)\leq\inf\theta(U)$ , and $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\downarrow\inf\theta(X)$ as $\gamma\uparrow+\infty$ . 2. (ii)

$\operatorname{dom}\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}=\operatorname{dom}f$ , and $(\forall x\in U)(\forall\mu\in\left]\gamma,+\infty\right[)$ $\inf\theta(X)\leq\,\overrightarrow{\operatorname{env}}_{\theta}^{\mu}(x)\leq\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(x)\leq\theta(x)$ . Consequently, $\inf\theta(X)\leq\inf\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(X)\leq\inf\theta(U)$ , and $\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(x)\downarrow\inf\theta(X)$ as $\gamma\uparrow+\infty$ .

Proof.

(i): We first show that $\operatorname{dom}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}=U$ . Let $y\in\operatorname{dom}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ . Then $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\inf_{x\in X}\big{(}\theta(x)+\frac{1}{\gamma}D_{f}(x,y)\big{)}<+\infty$ , and hence there exists $x\in X$ such that $\theta(x)+\frac{1}{\gamma}D_{f}(x,y)<+\infty$ . Since $\theta(x)>-\infty$ , this yields $y\in U$ .

From now on, let $y\in U$ , and pick $u\in\operatorname{dom}\theta\cap U$ . Then $-f(y)<+\infty$ , $\|\nabla f(y)\|<+\infty$ , $f(u)<+\infty$ , $\theta(u)<+\infty$ , and

[TABLE]

which gives $y\in\operatorname{dom}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ . Hence, $\operatorname{dom}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}=U$ .

Next, let $\mu\in\left]\gamma,+\infty\right[$ . Then $\frac{1}{\mu}<\frac{1}{\gamma}$ , $\theta\leq\theta+\frac{1}{\mu}D_{f}(\cdot,y)\leq\theta+\frac{1}{\gamma}D_{f}(\cdot,y)$ , and so

[TABLE]

Therefore,

[TABLE]

Taking now the infimum over $y\in U$ yields $\inf\theta(X)\leq\inf\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(X)\leq\inf\theta(U)$ . Consequently,

[TABLE]

On the other hand, $(\forall x\in X)$ $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\leq\theta(x)+\frac{1}{\gamma}D_{f}(x,y)$ , which implies that $(\forall x\in X)$ $\varlimsup_{\gamma\to+\infty}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\leq\theta(x)$ and thus $\varlimsup_{\gamma\to+\infty}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\leq\inf\theta(X)$ . Altogether, $\lim_{\gamma\to+\infty}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\inf\theta(X)$ and the conclusion follows from (9).

(ii): This is similar to (i). ∎

Denote by $\Gamma_{0}(X)$ the set of all proper lower semicontinuous convex functions from $X$ to $\left]-\infty,+\infty\right]$ . From now on, we strengthen our assumptions by requiring that

[TABLE]

This will allow us to obtain a quite satisfying theory in which the envelopes are convex functions. Note that $f$ is essentially smooth and essentially strictly convex in the sense of [42, Section 26]. It is well known that

[TABLE]

We will also work with the following standard assumptions:

A1

$\nabla^{2}f$ exists and is continuous on $U$ ;

A2

$D_{f}$ is jointly convex, i.e., convex on $X\times X$ ;

A3

$(\forall x\in U)$ $D_{f}(x,\cdot)$ is strictly convex on $U$ ;

A4

$(\forall x\in U)$ $D_{f}(x,\cdot)$ is coercive, i.e., $D_{f}(x,y)\to+\infty$ as $\|y\|\to+\infty$ .

We also henceforth assume that

[TABLE]

Example 2.3 (see [12, Example 2.16]).

Assumptions (11) and A1–A4 hold in the following cases, where $x=(\xi_{j})_{1\leq j\leq J}$ and $y=(\eta_{j})_{1\leq j\leq J}$ are two generic points in $X=\mathbb{R}^{J}$ .

(i)

Energy:* If $f\colon x\mapsto\tfrac{1}{2}\|x\|^{2}$ , then $U=X$ and*

[TABLE] 2. (ii)

Boltzmann–Shannon333When dealing with the Boltzmann–Shannon entropy and Fermi–Dirac entropy, it is understood that $0\cdot\ln(0):=0$ . For two vectors $x$ and $y$ in $X$ , expressions such as $x\leq y$ , $x\cdot y$ , and $x/y$ are interpreted coordinate-wise. entropy:* If $f\colon x\mapsto\displaystyle\sum_{j=1}^{J}\xi_{j}\ln(\xi_{j})-\xi_{j}$ , then $U=\{{x\in X}~{}\big{|}~{}{x>0}\}$ and one obtains the Kullback–Leibler divergence*

[TABLE] 3. (iii)

Fermi–Dirac entropy:* If $f\colon x\mapsto\displaystyle\sum_{j=1}^{J}\xi_{j}\ln(\xi_{j})+(1-\xi_{j})\ln(1-\xi_{j})$ , then $U=\{{x\in X}~{}\big{|}~{}{0<x<1}\}$ and*

[TABLE]

The following result relates the Bregman–Moreau envelopes to Fenchel conjugates.

Proposition 2.4.

Let $\theta\colon X\to\left]-\infty,+\infty\right]$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $\gamma\in\mathbb{R}_{++}$ . Then the following hold444 Indeed, the proof does not require any of A1–A4. :

(i)

$\gamma\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}\circ\nabla f^{*}=f^{*}-(\gamma\theta+f)^{*}$ . 2. (ii)

$\gamma\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}=f-(f^{*}+(\gamma\theta\circ\nabla f))^{*}$ .

Proof.

(i): This follows from [33, Theorem 2.4] and (12). Note also that the case in which $\gamma=1$ is related to [30, Theorem 1(i)] applied to $(f^{*},\theta^{*})$ instead of $(f,\theta)$ .

(ii): Let $x\in X$ . Using the fact that $f^{*}(\nabla f(y))=\left\langle{\nabla f(y)},{y}\right\rangle-f(y)$ (see, e.g., [42, Theorem 23.5]) and that $\big{(}\nabla f\big{)}^{-1}=\nabla f^{*}$ (see (12)), we obtain

[TABLE]

This completes the proof. ∎

In what follows, we shall require the following two facts.

Fact 2.5.

The following hold:

(i)

$(\forall x\in X)(\forall y\in U)\;\;D_{f}(x,y)=0\;\;\Leftrightarrow\;\;x=y$ . 2. (ii)

$(\forall y\in U)$ * $D_{f}(\cdot,y)$ is coercive, i.e., $D_{f}(x,y)\to+\infty$ as $\|x\|\to+\infty$ .*

Proof.

(i): See [6, Theorem 3.7.(iv)]. (ii): See [6, Theorem 3.7.(iii)]. ∎

Fact 2.6.

Let $\theta\in\Gamma_{0}(X)$ be such that $\operatorname{dom}\theta\cap U\neq\varnothing$ and let $\gamma\in\mathbb{R}_{++}$ . Consider the following properties:

(a)

$U\cap\operatorname{dom}\theta$ * is bounded.*

(b)

$\inf\theta(U)>-\infty$ .

(c)

$f$ * is supercoercive, i.e., $f(x)/\|x\|\to+\infty$ as $\|x\|\to+\infty$ .*

(d)

$(\forall x\in U)$ * $D_{f}(x,\cdot)$ is supercoercive.*

Then the following hold:

(i)

If any of the conditions (a), (b), or (c) holds, then

[TABLE]

or, equivalently,

[TABLE] 2. (ii)

If any of the conditions (a), (b), or (d) holds, then

[TABLE]

Proof.

Since $\frac{1}{\gamma}D_{f}=D_{\frac{1}{\gamma}f}$ , the result follows from [10, Lemma 2.12] applied to $\tfrac{1}{\gamma}f$ . ∎

The definition of proximal mappings relies on the following result. (For variants of Proposition 2.7 (i), see [33, Theorems 2.2 and 4.3].)

Proposition 2.7.

Let $\theta\colon X\to\left]-\infty,+\infty\right]$ be convex and such that $U\cap\operatorname{dom}\theta\neq\varnothing$ , and let $\gamma\in\mathbb{R}_{++}$ . Then the following hold:

(i)

$\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ * is convex and continuous on $U$ , and*

(a)

if (18) holds, i.e., $(\forall y\in U)$ $\theta(\cdot)+\frac{1}{\gamma}D_{f}(\cdot,y)$ is coercive, then $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ is proper; 2. (b)

if $\theta\in\Gamma_{0}(X)$ and $\theta(\cdot)+\frac{1}{\gamma}D_{f}(\cdot,y)$ is coercive for a given $y\in U$ , then there exists a unique point $z\in U$ such that $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\theta(z)+\frac{1}{\gamma}D_{f}(z,y)$ . 2. (ii)

$\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}$ * is convex and continuous on $U$ , and*

(a)

if (20) holds, i.e., $(\forall x\in U)$ $\theta(\cdot)+\frac{1}{\gamma}D_{f}(x,\cdot)$ is coercive, then $\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}$ is proper; 2. (b)

if $\theta\in\Gamma_{0}(X)$ and $\theta(\cdot)+\frac{1}{\gamma}D_{f}(x,\cdot)$ is coercive for a given $x\in U$ , then there exists a unique point $z\in U$ such that $\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(z)=\theta(z)+\frac{1}{\gamma}D_{f}(x,z)$ .

Proof.

Since $\frac{1}{\gamma}D_{f}=D_{\frac{1}{\gamma}f}$ , the result follows from [10, Propositions 3.4 and 3.5] applied to $\tfrac{1}{\gamma}f$ . ∎

In view of Proposition 2.7, we define the following operators on $U$ ; see also [10, Definition 3.7].

Definition 2.8 (Bregman proximity operators).

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ . If (18) holds for $\gamma=1$ , then the left proximity operator associated with $\theta$ is

[TABLE]

If (20) holds for $\gamma=1$ , then the right proximity operator associated with $\theta$ is

[TABLE]

Remark 2.9.

Suppose that $f=\frac{1}{2}\|\cdot\|^{2}$ and let $\theta\in\Gamma_{0}(X)$ . Then $U=\operatorname{int}\operatorname{dom}\,f=X$ and hence $U\cap\operatorname{dom}\theta=\operatorname{dom}\theta\neq\varnothing$ . Since $f(x)/\|x\|=\frac{1}{2}\|x\|\to+\infty$ as $\|x\|\to+\infty$ , 2.6 implies that (18) and (20) hold for all $\gamma\in\mathbb{R}_{++}$ . In this case, $D_{f}\colon(x,y)\mapsto\frac{1}{2}\|x-y\|^{2}$ and $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\theta}=\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\theta}=\operatorname{Prox}_{\theta}$ is the classical Moreau proximity operator of $\theta$ [38].

Given a closed convex subset $C$ of $X$ with $C\cap U\neq\varnothing$ , we have that $\iota_{C}\in\Gamma_{0}(X)$ , $\operatorname{dom}\iota_{C}=C$ , and hence $U\cap\operatorname{dom}\iota_{C}=U\cap C\neq\varnothing$ and also $\inf\iota_{C}(U)=0>-\infty$ , which together with 2.6 imply that (18) and (20) hold for all $\gamma\in\mathbb{R}_{++}$ . This leads to the following definition.

Definition 2.10 (Bregman projectors).

Let $C$ be a closed convex subset of $X$ such that $U\cap C\neq\varnothing$ . Then $\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}:=\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\iota_{C}}$ is the left Bregman projector onto $C$ and $\overrightarrow{\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}:=\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\iota_{C}}$ is the right Bregman projector onto $C$ .

Remark 2.11.

In view of Remark 2.9, if $f=\frac{1}{2}\|\cdot\|^{2}$ , then $\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}=\overrightarrow{\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}={\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}$ is the orthogonal projector onto $C$ . Note that $\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}$ , $\overrightarrow{\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}$ , and ${\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}$ are not, in general, the same when $f\neq\frac{1}{2}\|\cdot\|^{2}$ . Before we give a corresponding example, let us show that these projectors are the same when $X=\mathbb{R}$ .

Proposition 2.12.

Suppose that $X=\mathbb{R}$ and let $C$ be a closed convex subset of $\mathbb{R}$ such that $U\cap C\neq\varnothing$ . Then $\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}=\overrightarrow{\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}={\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}$ on $U$ .

Proof.

Let $y\in U$ . Because $X=\mathbb{R}$ , $(\forall z\in C)$ $(\exists\,\lambda_{z}\in\left[0,1\right])$ ${\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}y=\lambda_{z}z+(1-\lambda_{z})y$ . Since $D_{f}(\cdot,y)$ is convex, nonnegative, and $D_{f}(y,y)=0$ , it follows that

[TABLE]

This combined with Definition 2.10 yields $\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}(y)={\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}(y)$ . The proof that $\overrightarrow{\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}={\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}$ is similar. ∎

Example 2.13.

Here we illustrate how Bregman projectors may differ from the orthogonal projector. We adapt [6, Example 6.15], which illustrates the setting in which $f$ is an entropy function on $\mathbb{R}^{J}$ and $C$ is the “probabilistic hyperplane” $\{{x\in\mathbb{R}^{J}}~{}\big{|}~{}{\sum_{j}\xi_{j}=1}\}$ . For simplicity, we work in $X=\mathbb{R}^{2}$ . Suppose that $f_{1}$ is the energy from Example 2.3(i) while $f_{2}$ is the negative Boltzmann–Shannon entropy from Example 2.3(ii). Since we work in $\mathbb{R}^{2}$ , the probabilistic hyperplane is described by $\xi_{2}=1-\xi_{1}$ . We compute $\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}(1,0)$ by substituting $\eta_{1}=1,\eta_{2}=0,\xi_{2}=1-\xi_{1}$ and minimizing the resulting Bregman distance over $\xi_{1}$ . We obtain

(i)

$\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}(1,2)=(0,1)$ * for $D_{f_{1}}$ ,* 2. (ii)

$\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}(1,2)=\left(1/3,2/3\right)$ * for $D_{f_{2}}$ .*

We illustrate this in Figure 1. For $i\in\{1,2\}$ , we sketch the contour plot of $D_{f_{i}}(\cdot,(1,2))$ for the level given by $D_{f_{i}}(\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}(1,2),(1,2))$ together with the set $C$ .

Remark 2.14.

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $\gamma\in\mathbb{R}_{++}$ . Proposition 2.1 implies that $\,\overleftarrow{\operatorname{env}}_{\gamma\theta}=\gamma\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ and $\,\overrightarrow{\operatorname{env}}_{\gamma\theta}=\gamma\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}$ . We thus derive from the definition that if (18) holds, then

[TABLE]

Similarly, if (20) holds, then

[TABLE]

The next result provides information on the proximal mapping when the parameter is varied. For a variant of the last inequality in (26), see [33, Proposition 2.1(ii)].

Proposition 2.15.

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $\gamma\in\mathbb{R}_{++}$ .

(i)

If (18) holds, then $(\forall y\in U)(\forall\mu\in\left]\gamma,+\infty\right[)$

[TABLE] 2. (ii)

If (20) holds, then $(\forall x\in U)(\forall\mu\in\left]\gamma,+\infty\right[)$

[TABLE]

Proof.

This follows from Remark 2.14 and [36, Proposition 7.6.1]. ∎

The left and right proximal mappings can be characterized in various ways:

Proposition 2.16.

Let $\theta\in\Gamma_{0}(X)$ be such that $\operatorname{dom}\theta\cap U\neq\varnothing$ and let $\gamma\in\mathbb{R}_{++}$ .

(i)

Suppose that (18) holds. Then for every $(x,y)\in U\times U$ , the following conditions are equivalent:

(a)

$x=\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)$ , 2. (b)

$0\in\gamma\partial\theta(x)+\nabla f(x)-\nabla f(y)$ , 3. (c)

$(\forall z\in X)\quad\left\langle{\nabla f(y)-\nabla f(x)},{z-x}\right\rangle+\gamma\theta(x)\leq\gamma\theta(z)$ .

Moreover,

[TABLE]

is continuous on $U$ . 2. (ii)

Suppose that (20) holds. Then for every $(x,y)\in U\times U$ , the following conditions are equivalent:

(a)

$y=\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)$ , 2. (b)

$0\in\gamma\partial\theta(y)+\nabla^{2}f(y)(y-x)$ , 3. (c)

$(\forall z\in X)\quad\left\langle{\nabla^{2}f(y)(x-y)},{z-y}\right\rangle+\gamma\theta(y)\leq\gamma\theta(z)$ .

Moreover, $\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}$ is continuous on $U$ .

Proof.

Apply [10, Proposition 3.10] to $\gamma\theta$ . ∎

Remark 2.17.

Consider Proposition 2.16 and its notation.

(i)

In the case of item (i) and when $U^{*}=X$ , we note that, by (12),

[TABLE]

see also **[33, Theorem 4.2]** for a more general result. 2. (ii)

In the case of item (ii), let us prove the variant of **[33, Theorem 4.1]** stating that

[TABLE]

Indeed, let $x_{1}$ and $x_{2}$ be in $U$ , and set $y_{i}=\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x_{i})$ for $i\in\{1,2\}$ . Then $\theta(y_{1})+\tfrac{1}{\gamma}D_{f}(x_{1},y_{1})\leq\theta(y_{2})+\tfrac{1}{\gamma}D_{f}(x_{1},y_{2})$ and $\theta(y_{2})+\tfrac{1}{\gamma}D_{f}(x_{2},y_{2})\leq\theta(y_{1})+\tfrac{1}{\gamma}D_{f}(x_{2},y_{1})$ . Adding and simplifying yields

[TABLE]

A direct expansion (or the four-point identity from **[11, Remark 2.5]**) shows that (31) is the same as

[TABLE]

therefore, (30) follows. We do not know whether or not in general the operator in (30) is the gradient of a convex function.

Corollary 2.18.

Let $C$ be a closed convex subset of $X$ such that $U\cap C\neq\varnothing$ , let $(x,y)\in U\times U$ , and let $p\in U\cap C$ . Then the following hold:

(i)

$p=\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}y\ \Leftrightarrow\ (\forall z\in C)\;\left\langle{\nabla f(y)-\nabla f(p)},{z-p}\right\rangle\leq 0$ . 2. (ii)

$p=\overrightarrow{\operatorname{P}\thinspace}_{\negthinspace\negthinspace C}x\ \Leftrightarrow\ (\forall z\in C)\;\left\langle{\nabla^{2}f(p)(x-p)},{z-p}\right\rangle\leq 0$ .

Proof.

In light of Definition 2.10, we apply Proposition 2.16 (see also [6, Proposition 3.16]). ∎

The derivatives of the left and right Bregman–Moreau envelopes feature the corresponding proximal mappings as follows.

Proposition 2.19.

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $\gamma\in\mathbb{R}_{++}$ . Then the following hold:

(i)

If (18) holds, then $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ is differentiable on $U$ and

[TABLE] 2. (ii)

If (20) holds, then $\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}$ is differentiable on $U$ and

[TABLE]

Proof.

Combine Remark 2.14 with [10, Proposition 3.12]. ∎

The following result, which is a variant of [32, Theorem XV.4.1.7], highlights the connection to convex optimization.

Theorem 2.20.

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ , let $\gamma\in\mathbb{R}_{++}$ , and let $x,y\in U$ .

(i)

Suppose that (18) holds. Then the following are equivalent:

(a)

$y\in\operatorname*{argmin}\theta$ , 2. (b)

$y\in\operatorname{Fix}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}$ , 3. (c)

$y\in\operatorname*{argmin}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ , 4. (d)

$\theta(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))=\theta(y)$ , 5. (e)

$\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\theta(y)$ .

Consequently,

[TABLE] 2. (ii)

Suppose that (20) holds. Then the following are equivalent:

(a)

$x\in\operatorname*{argmin}\theta$ , 2. (b)

$x\in\operatorname{Fix}\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}$ , 3. (c)

$x\in\operatorname*{argmin}\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}$ , 4. (d)

$\theta(\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x))=\theta(x)$ , 5. (e)

$\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(x)=\theta(x)$ .

Consequently,

[TABLE]

Proof.

(i): Using Proposition 2.16 (i), we have

[TABLE]

This proves that (i)(a) $\Leftrightarrow$ (i)(b).

Assume that (i)(b) holds, i.e., $y=\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)$ . Then $\nabla\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=0$ by Proposition 2.19 (i), and thus (i)(c) holds by the convexity of $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ shown in Proposition 2.7 (i). Next, (i)(d) is obvious and (i)(e) holds due to (24a).

Now recall from (24) that

[TABLE]

If (i)(c) holds, then since $\inf\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(X)\leq\inf\theta(U)$ (see Proposition 2.2 (i)), combining with (38) yields

[TABLE]

which implies that $D_{f}(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y)=0$ , so $y=\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)$ due to 2.5 (i), and we get (i)(b).

If (i)(d) holds, then by (38), $D_{f}(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y)=0$ , and (i)(b) thus holds.

Finally, if (i)(e) holds, then $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)=\theta(y)+\frac{1}{\gamma}D_{f}(y,y)$ , and using Proposition 2.7 (i) and (24a), we must have $y=\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)$ and therefore get (i)(b).

(ii): This is proved similarly to (i) by using Proposition 2.2 (ii), Proposition 2.7 (ii), Proposition 2.16 (ii), Proposition 2.19 (ii), and (25a). The difference between (35) and (36) is because $\operatorname{dom}\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}=U$ while $\operatorname{dom}\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}=\operatorname{dom}f$ . ∎

3 Asymptotic behaviour properties

The results in this section, almost all of which are new, extend or complement results for the classical energy case and for left variants studied in [28] and [33]. We will require the following lemma.

Lemma 3.1.

Let $C$ be a compact subset of a Hausdorff space $\mathcal{X}$ , let $\phi\colon\mathcal{X}\to\left[-\infty,+\infty\right]$ be lower semicontinuous, let $(x_{a})_{a\in A}$ be a net in $C$ , and suppose that $\phi(x_{a})\to\inf\phi(\mathcal{X})$ . Then $\operatorname*{argmin}\phi\neq\varnothing$ and all cluster points of $(x_{a})_{a\in A}$ lie in $\operatorname*{argmin}\phi$ . Consequently, if $\phi$ attains its minimum at a unique point $u$ , then $x_{a}\to u$ .

Proof.

This follows from the lower semicontinuity of $\phi$ and [9, Lemma 1.14]. ∎

What is the behaviour of Bregman–Moreau envelopes and proximity operators when $\gamma\downarrow 0$ ? The next two results provide answers.

Proposition 3.2.

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $x,y\in U$ . Then the following hold:

(i)

If (18) holds for some $\mu\in\mathbb{R}_{++}$ instead of $\gamma$ , then $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to y$ as $\gamma\downarrow 0$ . 2. (ii)

If (20) holds for some $\mu\in\mathbb{R}_{++}$ instead of $\gamma$ , then $\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\to x$ as $\gamma\downarrow 0$ .

Proof.

(i): Noting that $(\forall\gamma\in\left]0,\mu\right])$ $\theta+\frac{1}{\gamma}D_{f}(\cdot,y)\geq\theta+\frac{1}{\mu}D_{f}(\cdot,y)$ , we have that (18) holds for all $\gamma\in\left]0,\mu\right]$ . In particular, $g:=\theta+\tfrac{1}{\mu}D_{f}(\cdot,y)$ is coercive. By Proposition 2.2 (i) and (24a),

[TABLE]

and so $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\in\mathrm{lev}_{\leq\theta(y)}\>g$ . The coercivity of $g$ and [9, Proposition 11.12] imply that $\nu:=\sup_{\gamma\in\left]0,\mu\right]}\|\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\|<+\infty$ . Now by [9, Theorem 9.20], there exist $u\in X$ and $\eta\in\mathbb{R}$ such that $\theta\geq\left\langle{\cdot},{u}\right\rangle+\eta$ . Using (40) and Cauchy–Schwarz yields

[TABLE]

which gives

[TABLE]

and thus $D_{f}(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y)\to 0$ as $\gamma\downarrow 0$ . Observing that $D_{f}(\cdot,y)=f(\cdot)-f(y)-\left\langle{\nabla f(y)},{\cdot-y}\right\rangle$ is lower semicontinuous, that $\operatorname*{argmin}D_{f}(\cdot,y)=\{y\}$ by 2.5 (i), and that $\sup_{\gamma\in\left]0,1\right[}\|\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\|<+\infty$ , it follows from Lemma 3.1 that $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to y$ as $\gamma\downarrow 0$ .

(ii): This is similar to (i). ∎

Theorem 3.3.

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $x,y\in U$ . Then the following hold:

(i)

If (18) holds for some $\mu\in\mathbb{R}_{++}$ and $\gamma\downarrow 0$ , then $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\uparrow\theta(y)$ , $\theta(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))\uparrow\theta(y)$ , and $\tfrac{1}{\gamma}D_{f}(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y)\to 0$ . 2. (ii)

If (20) holds for some $\mu\in\mathbb{R}_{++}$ , and $\gamma\downarrow 0$ , then $\,\overrightarrow{\operatorname{env}}_{\theta}^{\gamma}(x)\uparrow\theta(x)$ , $\theta(\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x))\uparrow\theta(x)$ , and $\tfrac{1}{\gamma}D_{f}(x,\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x))\to 0$ .

Proof.

(i): According to Proposition 2.2 (i), there exists $\beta\in\mathbb{R}$ such that $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\uparrow\beta\leq\theta(y)$ as $\gamma\downarrow 0$ . Combining with (24a), we have

[TABLE]

This together with the fact that $\lim_{\gamma\downarrow 0}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)=y$ by Proposition 3.2 (i), and the lower semicontinuity of $\theta$ implies

[TABLE]

and then $\beta=\theta(y)=\lim_{\gamma\downarrow 0}\theta(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))$ . Now recall (43) and Proposition 2.15 (i).

(ii): This is similar to (i). ∎

For a variant of the result from Theorem 3.3 (i) that $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\uparrow\theta(y)$ as $\gamma\downarrow 0$ , see [33, Theorem 2.5]. Note that $D_{f}(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y)$ is monotone with respect to $\gamma$ , as shown in Proposition 2.15 (i), but the same is not necessarily true for $\tfrac{1}{\gamma}D_{f}(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y),y)$ (see Figure 2).

The two following results describe the behaviour when $\gamma\uparrow+\infty$ .

Proposition 3.4.

Let $\theta\in\Gamma_{0}(X)$ be such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $x,y\in U$ . Then the following hold:

(i)

If (18) holds for all $\gamma\in\mathbb{R}_{++}$ , then $\theta(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))\to\inf\theta(X)$ as $\gamma\uparrow+\infty$ . 2. (ii)

If (20) holds for all $\gamma\in\mathbb{R}_{++}$ , then $\theta(\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x))\to\inf\theta(X)$ as $\gamma\uparrow+\infty$ .

Proof.

We shall just prove (i) because the proof of (ii) is similar. Assume that (18) holds for all $\gamma\in\mathbb{R}_{++}$ . Combining (24b) with Proposition 2.2 (i) yields

[TABLE]

which implies that $\theta(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))\to\inf\theta(X)$ as $\gamma\uparrow+\infty$ . ∎

Theorem 3.5.

Let $\theta\in\Gamma_{0}(X)$ be coercive such that $U\cap\operatorname{dom}\theta\neq\varnothing$ and let $x,y\in U$ . Then the following hold:

(i)

The net $(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))_{\gamma\in\mathbb{R}_{++}}$ is bounded with all cluster points as $\gamma\uparrow+\infty$ lying in $\operatorname*{argmin}\theta$ . Moreover,

(a)

if $\operatorname*{argmin}\theta$ is a singleton, then $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to\operatorname*{argmin}\theta$ as $\gamma\uparrow+\infty$ ; 2. (b)

if $\operatorname*{argmin}\theta\subseteq U$ , then $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace\operatorname*{argmin}\theta}y$ as $\gamma\uparrow+\infty$ . 2. (ii)

The net $(\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x))_{\gamma\in\mathbb{R}_{++}}$ is bounded with all cluster points as $\gamma\uparrow+\infty$ lying in $\operatorname*{argmin}\theta$ . Moreover,

(a)

if $\operatorname*{argmin}\theta$ is a singleton, then $\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\to\operatorname*{argmin}\theta$ as $\gamma\uparrow+\infty$ ; 2. (b)

if $\operatorname*{argmin}\theta\subseteq U$ , then $\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(x)\to\overrightarrow{\operatorname{P}\thinspace}_{\negthinspace\negthinspace\operatorname*{argmin}\theta}x$ as $\gamma\uparrow+\infty$ .

Proof.

First, by assumption, [9, Proposition 11.15(i)] gives $\operatorname*{argmin}\theta\neq\varnothing$ . This combined with [9, Lemma 1.24 and Corollary 8.5] implies that $\operatorname*{argmin}\theta=\mathrm{lev}_{\leq\inf\theta(X)}\>\theta$ is a nonempty closed convex subset of $X$ . Now since $\theta$ is coercive and since $D_{f}\geq 0$ , we immediately get that (18) and (20) hold for all $\gamma\in\mathbb{R}_{++}$ .

(i): It follows from (24b) that

[TABLE]

and then from the coercivity of $\theta$ and [9, Proposition 11.12] that $(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))_{\gamma\in\mathbb{R}_{++}}$ is bounded. In turn, Proposition 3.4 (i) and Lemma 3.1 imply that all cluster points of $(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))_{\gamma\in\mathbb{R}_{++}}$ as $\gamma\uparrow+\infty$ lie in $\operatorname*{argmin}\theta$ , and we get (i)(a).

Now assume that $\operatorname*{argmin}\theta\subseteq U$ . Let $y^{\prime}$ be a cluster point of $(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))_{\gamma\in\mathbb{R}_{++}}$ as $\gamma\uparrow+\infty$ . Then $y^{\prime}\in\operatorname*{argmin}\theta\subseteq U$ and there exists a sequence $(\gamma_{n})_{n\in{\mathbb{N}}}$ in $\mathbb{R}_{++}$ such that $\gamma_{n}\uparrow+\infty$ and $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma_{n}\theta}(y)\to y^{\prime}$ as $n\to+\infty$ . Let $z\in\operatorname*{argmin}\theta$ . We have $(\forall{n\in{\mathbb{N}}})$ $\theta(z)\leq\theta(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma_{n}\theta}(y))$ , and by Proposition 2.16 (i),

[TABLE]

Taking the limit as $n\to+\infty$ and using the continuity of $\nabla f$ yield

[TABLE]

Since $z\in\operatorname*{argmin}\theta$ was chosen arbitrarily and since $\operatorname*{argmin}\theta$ is a closed convex subset of $X$ with $U\cap\operatorname*{argmin}\theta\neq\varnothing$ , in view of Corollary 2.18 (i), $y^{\prime}=\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace\operatorname*{argmin}\theta}y$ , and $\overleftarrow{\thinspace\operatorname{P}\thinspace}_{\negthinspace\negthinspace\operatorname*{argmin}\theta}y$ is thus the only cluster point of $(\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y))_{\gamma\in\mathbb{R}_{++}}$ as $\gamma\uparrow+\infty$ . Hence, (i)(b) holds.

(ii): The proof is similar to the one of (i). ∎

Remark 3.6.

Suppose that $f=\frac{1}{2}\|\cdot\|^{2}$ and let $\theta\in\Gamma_{0}(X)$ be coercive. By Remark 2.11 and Theorem 3.5,

[TABLE]

Corollary 3.7.

Let $\theta\in\Gamma_{0}(\mathbb{R})$ be coercive such that $\operatorname*{argmin}\theta\subseteq U$ and let $z\in U$ . Then $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(z)\to P_{\operatorname*{argmin}\theta}z$ and $\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(z)\to P_{\operatorname*{argmin}\theta}z$ as $\gamma\uparrow+\infty$ .

Proof.

As shown in the proof of Theorem 3.5, $\operatorname*{argmin}\theta$ is a nonempty closed convex subset of $X$ and hence $U\cap\operatorname*{argmin}\theta\neq\varnothing$ . It now suffices to apply Theorem 3.5 (i)(b) and (ii)(b) and to use Proposition 2.12. ∎

4 Examples and function minimization

In this final section, we illustrate our theory by considering the case in which $\theta$ is the nonsmooth function $x\mapsto|x-\tfrac{1}{2}|$ .

Example 4.1.

Suppose that $X=\mathbb{R}$ and let $\theta\colon\mathbb{R}\to\mathbb{R}\colon x\mapsto|x-\frac{1}{2}|$ . Then $\theta\in\Gamma_{0}(X)$ , $\operatorname{dom}\theta=X$ , and $\theta$ is coercive with $\operatorname*{argmin}\theta=\{\frac{1}{2}\}$ . It follows that $U\cap\operatorname{dom}\theta=U\neq\varnothing$ and, by 2.6, the assumptions (18) and (20) hold for all $\gamma\in\mathbb{R}_{++}$ . We revisit Example 2.3 (with $J=1$ ) to illustrate Theorem 2.20, Proposition 3.2, and Theorem 3.5. Let $\gamma\in\mathbb{R}_{++}$ . We recall from Proposition 2.16 that

[TABLE]

and that

[TABLE]

Note that

[TABLE]

(i)

Energy:* Suppose that $f$ is the energy. Then $U=\operatorname{int}\operatorname{dom}\,f=\mathbb{R}$ . Since $\nabla f=\operatorname{Id}$ , by Remark 2.9 and (50), $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}=\overrightarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}=(\operatorname{Id}+\gamma\partial\theta)^{-1}$ . We have that*

[TABLE]

Then $(\nabla f+\gamma\partial\theta)^{-1}(y)$ amounts to solving $(\nabla f+\gamma\partial\theta)(x)=y$ piecewise. For example, solving $x-\gamma=y$ for $x<\tfrac{1}{2}$ yields $x=y+\gamma$ for $y+\gamma<\tfrac{1}{2}$ , so $(\nabla f+\gamma\theta)^{-1}(y)=y+\gamma$ for $y<\tfrac{1}{2}-\gamma$ . Continuing in this fashion,

[TABLE]

and by (24a),

[TABLE]

It is clear that $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(\frac{1}{2})=\frac{1}{2}$ , while $(\forall y\in\mathbb{R}\smallsetminus\{\frac{1}{2}\})$ $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\neq y$ , and so $\operatorname{Fix}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}=\{\frac{1}{2}\}=\operatorname*{argmin}\theta$ . As expected, $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to y$ as $\gamma\downarrow 0$ , and $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to\frac{1}{2}=\operatorname*{argmin}\theta$ as $\gamma\uparrow+\infty$ . Moreover, $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\to\theta(y)$ as $\theta\downarrow 0$ ; this is illustrated in Figure 3.

(ii)

Boltzmann–Shannon entropy:* Suppose that $f$ is the Boltzmann–Shannon entropy. Then $\operatorname{dom}f=\mathbb{R}_{+}$ , $U=\operatorname{int}\operatorname{dom}\,f=\mathbb{R}_{++}$ , $\nabla f(x)=\ln x$ , and $\nabla^{2}f(x)=1/x$ . Again employing (50) and (24a), we have*

[TABLE]

Clearly $\operatorname{Fix}\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}=\{\frac{1}{2}\}=\operatorname*{argmin}\theta$ . It can also be seen that $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to y$ as $\gamma\downarrow 0$ , and $\overleftarrow{\operatorname{P}}_{\negthinspace\negthinspace\gamma\theta}(y)\to\frac{1}{2}=\operatorname*{argmin}\theta$ as $\gamma\uparrow+\infty$ . Moreover, once again $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}(y)\to\theta(y)$ as $\theta\downarrow 0$ . This example is illustrated in Figure 4.

Now (51) implies that for every $(x,y)\in\mathbb{R}_{++}\times\mathbb{R}_{++}$ ,

[TABLE]

Solving the induced system of equations yields

[TABLE]

Using (25a) and noting that $\frac{x}{1-\gamma}<\frac{1}{2}$ if $x<\frac{1-\gamma}{2}$ and $\frac{x}{1+\gamma}>\frac{1}{2}$ if $x>\frac{1+\gamma}{2}$ , we obtain

[TABLE]

The right envelope is shown in Figure 5.

(iii)

Fermi–Dirac entropy:* Suppose that $f$ is the Fermi–Dirac entropy. Then $\operatorname{dom}f=\left[0,1\right]$ , $U=\operatorname{int}\operatorname{dom}\,f=\left]0,1\right[$ , $\nabla f(x)=\ln\left(\frac{x}{1-x}\right)$ , and $\nabla^{2}f(x)=\frac{1}{x(1-x)}$ . Again by (50),*

[TABLE]

A formula for $\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ may be once again obtained by using (24a):

[TABLE]

We illustrate this envelope in Figure 6.

Next we have from (51) that for every $(x,y)\in\left]0,1\right[\times\left]0,1\right[$ ,

[TABLE]

Solving the induced system of equations gives

[TABLE]

and, in turn, (25a) gives

[TABLE]

The right envelope is shown in Figure 7.

We conclude this section with some remarks concerning the minimization of the (nonlinear) functional

[TABLE]

where $\tau$ is a convex, lower semicontinuous, and proper, and subject to finitely many constraints

[TABLE]

and where $a_{k}\in L^{\infty}[0,1]$ and $\rho$ was used to generate the (consistent) data: $\left\langle{a_{k}},{\rho}\right\rangle=b_{k}$ . Under appropriate assumptions (for details, see [13], [14], [15, Section 7], [16, Theorem 6.3.4], [17], [18, Section 4.7]), recovering the (primal) solution $x=x_{\tau}$ amounts to first obtaining a dual solution by solving the finite system of nonlinear equations

[TABLE]

followed by computing

[TABLE]

As a numerical illustration, we assume that $\tau=\,\overleftarrow{\operatorname{env}}_{\theta}^{\gamma}$ is the left Bregman envelope from Example 4.1 (iii) and $\rho$ is the step function pictured in Figure 8 (right). We clearly observe the influence of the parameter $\gamma$ ; smaller values of $\gamma$ lead to primal solutions that are closer to the step function that was used to generate the data. Were a different, smoother $\rho$ to be used, a larger value of $\gamma$ might be more appropriate. The reason for the varying extent to which the primal solutions for different choices of $\gamma$ resemble step functions may be gleaned from Figure 8 (left), where $(\tau^{*})^{\prime}$ —upon which the primal solution (68) depends—is shown.

Similarly attempting to compute with $\tau$ as the right envelope from Example 4.1 (iii), we are unable to symbolically invert the gradient. We leave the numerical attempt at inversion as future work.

Acknowledgments

HHB was partially supported by the Natural Sciences and Engineering Research Council of Canada. MND was partially supported by the Australian Research Council Discovery Project DP160101537.

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] F. Alvarez, R. Correa, and M. Marechal, Regular self-proximal distances are Bregman, Journal of Convex Analysis 24 (2017), 135–148.
3[3] H. Attouch, Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés, Comptes Rendus de l’Académie des Sciences de Paris 284 (1977), 539–542.
4[4] H. Attouch, Variational Convergence for Functions and Operators , Pitman, 1984.
5[5] H.H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Mathematics of Operations Research 42 (2016), 330–348.
6[6] H.H. Bauschke and J.M. Borwein, Legendre functions and the method of random Bregman projections, Journal of Convex Analysis 4 (1997), 27–67.
7[7] H.H. Bauschke, J.M. Borwein, and P.L. Combettes, Bregman monotone optimization algorithms, SIAM Journal on Control and Optimization 42 (2003), 596–636.
8[8] H.H. Bauschke and P.L. Combettes, Iterating Bregman retractions, SIAM Journal on Optimization 13 (2003), 1159–1173.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Regularizing with Bregman–Moreau envelopes

Abstract

1 Introduction

2 Basic properties

Proposition 2.1**.**

Proof.

Proposition 2.2**.**

Proof.

Example 2.3** (see [12, Example 2.16]).**

Proposition 2.4**.**

Proof.

Fact 2.5**.**

Proof.

Fact 2.6**.**

Proof.

Proposition 2.7**.**

Proof.

Definition 2.8** (Bregman proximity operators).**

Remark 2.9**.**

Definition 2.10** (Bregman projectors).**

Remark 2.11**.**

Proposition 2.12**.**

Proof.

Example 2.13**.**

Remark 2.14**.**

Proposition 2.15**.**

Proof.

Proposition 2.16**.**

Proof.

Remark 2.17**.**

Corollary 2.18**.**

Proof.

Proposition 2.19**.**

Proof.

Theorem 2.20**.**

Proof.

3 Asymptotic behaviour properties

Lemma 3.1**.**

Proof.

Proposition 3.2**.**

Proof.

Theorem 3.3**.**

Proof.

Proposition 3.4**.**

Proof.

Theorem 3.5**.**

Proof.

Remark 3.6**.**

Corollary 3.7**.**

Proof.

4 Examples and function minimization

Example 4.1**.**

Acknowledgments

Proposition 2.1.

Proposition 2.2.

Example 2.3 (see [12, Example 2.16]).

Proposition 2.4.

Fact 2.5.

Fact 2.6.

Proposition 2.7.

Definition 2.8 (Bregman proximity operators).

Remark 2.9.

Definition 2.10 (Bregman projectors).

Remark 2.11.

Proposition 2.12.

Example 2.13.

Remark 2.14.

Proposition 2.15.

Proposition 2.16.

Remark 2.17.

Corollary 2.18.

Proposition 2.19.

Theorem 2.20.

Lemma 3.1.

Proposition 3.2.

Theorem 3.3.

Proposition 3.4.

Theorem 3.5.

Remark 3.6.

Corollary 3.7.

Example 4.1.