On Decomposition Models in Imaging Sciences and Multi-time   Hamilton-Jacobi Partial Differential Equations

J\'er\^ome Darbon; Tingwei Meng

arXiv:1906.09502·math.OC·July 27, 2020·SIAM J. Imaging Sci.

On Decomposition Models in Imaging Sciences and Multi-time Hamilton-Jacobi Partial Differential Equations

J\'er\^ome Darbon, Tingwei Meng

PDF

TL;DR

This paper explores the theoretical links between multi-time Hamilton-Jacobi PDEs and variational image decomposition models, revealing how solutions and minimizers relate and proposing methods for models with non-unique solutions.

Contribution

It establishes new theoretical connections between Hamilton-Jacobi PDEs and image decomposition, including uniqueness proofs and regularization techniques for non-unique minimizers.

Findings

01

Minimal values governed by multi-time Hamilton-Jacobi PDEs

02

Minimizers represented via Hamilton-Jacobi momentum

03

Regularization approach for non-unique minimizers

Abstract

This paper provides new theoretical connections between multi-time Hamilton-Jacobi partial differential equations and variational image decomposition models in imaging sciences. We show that the minimal values of these optimization problems are governed by multi-time Hamilton-Jacobi partial differential equations. The minimizers of these optimization problems can be represented using the momentum in the corresponding Hamilton-Jacobi partial differential equation. Moreover, variational behaviors of both the minimizers and the momentum are investigated as the regularization parameters approach zero. In addition, we provide a new perspective from convex analysis to prove the uniqueness of convex solutions to Hamilton-Jacobi equations. Finally we consider image decomposition models that do not have unique minimizers and we propose a regularization approach to perform the analysis using…

Tables2

Table 1. Table 1. Notations used in this paper. Here, we use C 𝐶 C to denote a set, f 𝑓 f to denote a function and x , d 𝑥 𝑑 x,d to denote vectors in ℝ n superscript ℝ 𝑛 \mathbb{R}^{n} .

Notation	Meaning	Definition
$dom f$	domain of $f$	${x \in ℝ^{n} : f (x) \in ℝ}$
$ri C$	relative interior of $C$	the interior of $C$ with respect to the minimal hyperplane containing $C$ in $ℝ^{n}$
$N_{C} (x)$	normal cone of $C$ at $x$	${q \in ℝ^{n} : ⟨ q, y - x ⟩ \leq 0 for any y \in C}$
$C_{\infty} (x)$	asymptotic cone of $C$	${d \in ℝ^{n} : x + t d \in C for all t > 0}$
$epi f$	epigraph of $f$	${(x, t) \in ℝ^{n} \times ℝ : x \in dom f, t \geq f (x)}$
$Γ_{0} (ℝ^{n})$	a useful and standard class of convex functions	the set containing all proper, convex, l.s.c. functions from $ℝ^{n}$ to $ℝ \cup {+ \infty}$
$f^{'} (x, d)$	directional derivative of $f$ at $x$ along the direction $d$	${lim}_{h \to 0^{+}} \frac{1}{h} (f (x + h d) - f (x))$
$\partial f (x)$	subdifferential of $f$ at $x$	${p \in ℝ^{n} : f (y) \geq f (x) + ⟨ p, y - x ⟩ \forall y \in ℝ^{n}}$
$I_{C}$	the indicator function of $C$	If $x \in C$ , then define $I_{C} (x) := 0$ . Otherwise, define $I_{C} (x) := + \infty$ .
$f^{*}$	Legendre transform of $f$	$f^{*} (p) := {sup}_{x \in ℝ^{n}} ⟨ p, x ⟩ - f (x)$
$f □ g$	inf-convolution of $f$ and $g$	$(f □ g) (x) := {inf}_{u \in ℝ^{n}} f (u) + g (x - u)$

Table 2. Table 2. Numerical results of the TVL1 model with the proposed regularization method.

Equations273

{\frac{\partial S ( x , t )}{\partial t} + H (x, t, S (x, t), \nabla_{x} S (x, t)) = 0, S (x, 0) = J (x), x \in R^{n}, t > 0; x \in R^{n},

{\frac{\partial S ( x , t )}{\partial t} + H (x, t, S (x, t), \nabla_{x} S (x, t)) = 0, S (x, 0) = J (x), x \in R^{n}, t > 0; x \in R^{n},

S (x, t)

S (x, t)

= u \in R^{n} in f J (u) + t H^{*} (\frac{x - u}{t})

u_{0} + \dots + u_{N} = x arg min f_{0} (u_{0}) + j = 1 \sum N λ_{j} f_{j} (u_{j}) .

u_{0} + \dots + u_{N} = x arg min f_{0} (u_{0}) + j = 1 \sum N λ_{j} f_{j} (u_{j}) .

∥ u ∥_{T V} := sup {\int_{Ω} u (x) div ϕ (x) d x : ϕ \in C_{c}^{1} (Ω, R^{2}), ∥ ϕ ∥_{L^{\infty}} \leq 1} .

∥ u ∥_{T V} := sup {\int_{Ω} u (x) div ϕ (x) d x : ϕ \in C_{c}^{1} (Ω, R^{2}), ∥ ϕ ∥_{L^{\infty}} \leq 1} .

B V (Ω) = {u \in L^{1} (Ω) : ∥ u ∥_{T V} < + \infty} .

B V (Ω) = {u \in L^{1} (Ω) : ∥ u ∥_{T V} < + \infty} .

u \in B V (Ω) arg min ∥ u ∥_{T V} + \frac{1}{2 λ} ∥ x - u ∥_{L^{2}}^{2} .

u \in B V (Ω) arg min ∥ u ∥_{T V} + \frac{1}{2 λ} ∥ x - u ∥_{L^{2}}^{2} .

B M O := {f \in L_{l oc}^{1} (R^{n}) : sup {\frac{1}{∣ Q ∣} \int_{Q} ∣ f (x) - f_{Q} ∣ d x : Q is any ball in R^{n}} < + \infty} where the symbol f_{Q} is defined by f_{Q} := \frac{1}{∣ Q ∣} \int_{Q} f (x) d x,

B M O := {f \in L_{l oc}^{1} (R^{n}) : sup {\frac{1}{∣ Q ∣} \int_{Q} ∣ f (x) - f_{Q} ∣ d x : Q is any ball in R^{n}} < + \infty} where the symbol f_{Q} is defined by f_{Q} := \frac{1}{∣ Q ∣} \int_{Q} f (x) d x,

\begin{split}\dot{B}_{1}^{1,1}:=\Big{\{}f\in L^{\frac{n}{n-1}}(\mathbb{R}^{n}):\ &\sum_{j\in\mathbb{Z}}\sum_{k\in\mathbb{Z}^{n}}|c(j,k)|2^{j(1-n/2)}<+\infty,\\ &\text{ where }\{c(j,k)\}\text{ are the wavelet coefficients of }f\Big{\}}.\end{split}

\begin{split}\dot{B}_{1}^{1,1}:=\Big{\{}f\in L^{\frac{n}{n-1}}(\mathbb{R}^{n}):\ &\sum_{j\in\mathbb{Z}}\sum_{k\in\mathbb{Z}^{n}}|c(j,k)|2^{j(1-n/2)}<+\infty,\\ &\text{ where }\{c(j,k)\}\text{ are the wavelet coefficients of }f\Big{\}}.\end{split}

G := {f = \partial_{1} g_{1} + \partial_{2} g_{2} : g_{1}, g_{2} \in L^{\infty} (R^{2})}, ∥ f ∥_{G} := in f {∥ (g_{1}^{2} + g_{2}^{2})^{1/2} ∥_{L^{\infty}} : f = \partial_{1} g_{1} + \partial_{2} g_{2}} .

G := {f = \partial_{1} g_{1} + \partial_{2} g_{2} : g_{1}, g_{2} \in L^{\infty} (R^{2})}, ∥ f ∥_{G} := in f {∥ (g_{1}^{2} + g_{2}^{2})^{1/2} ∥_{L^{\infty}} : f = \partial_{1} g_{1} + \partial_{2} g_{2}} .

\begin{split}&{\color[rgb]{0,0,0}\operatorname*{arg\,min}_{u\in BV(\Omega)}\|u\|_{TV}+\lambda\|x-u\|_{X},\quad\text{where the space $X$ can be $E$, $F$ or $G$.}}\end{split}

\begin{split}&{\color[rgb]{0,0,0}\operatorname*{arg\,min}_{u\in BV(\Omega)}\|u\|_{TV}+\lambda\|x-u\|_{X},\quad\text{where the space $X$ can be $E$, $F$ or $G$.}}\end{split}

{\color[rgb]{0,0,0}\operatorname*{arg\,min}_{u\in BV(\Omega)}\|u\|_{TV}+\lambda\int_{\Omega}\left|\nabla(\Delta)^{-1}(x-u)\right|^{2}dxdy.}

{\color[rgb]{0,0,0}\operatorname*{arg\,min}_{u\in BV(\Omega)}\|u\|_{TV}+\lambda\int_{\Omega}\left|\nabla(\Delta)^{-1}(x-u)\right|^{2}dxdy.}

u \in B V (Ω) arg min ∥ u ∥_{T V} + I {∥ x - u ∥_{G} \leq μ},

u \in B V (Ω) arg min ∥ u ∥_{T V} + I {∥ x - u ∥_{G} \leq μ},

u \in B V (Ω), v \in G arg min ∥ u ∥_{T V} + I {∥ v ∥_{G} \leq μ} + \frac{1}{2 λ} ∥ x - u - v ∥_{L^{2}}^{2},

u \in B V (Ω), v \in G arg min ∥ u ∥_{T V} + I {∥ v ∥_{G} \leq μ} + \frac{1}{2 λ} ∥ x - u - v ∥_{L^{2}}^{2},

S (x, μ, λ) := u, v \in R^{n} min J (u) + J^{*} (\frac{v}{μ}) + \frac{1}{2 λ} ∥ x - u - v ∥_{2}^{2} .

S (x, μ, λ) := u, v \in R^{n} min J (u) + J^{*} (\frac{v}{μ}) + \frac{1}{2 λ} ∥ x - u - v ∥_{2}^{2} .

J (u) := i = 1 \sum m_{1} - 1 j = 1 \sum m_{2} - 1 ∣ u_{i + 1, j} - u_{i, j} ∣ + ∣ u_{i, j + 1} - u_{i, j} ∣.

J (u) := i = 1 \sum m_{1} - 1 j = 1 \sum m_{2} - 1 ∣ u_{i + 1, j} - u_{i, j} ∣ + ∣ u_{i, j + 1} - u_{i, j} ∣.

\begin{split}\|v\|_{G}=\inf\Biggl{\{}&\sup_{\begin{subarray}{c}1\leq i\leq m_{1}\\ 1\leq j\leq m_{2}\end{subarray}}\sqrt{(g_{i,j})^{2}+(h_{i,j})^{2}}\colon\quad v_{i,j}=g_{i,j}-g_{i-1,j}+h_{i,j}-h_{i,j-1},\quad\\ &\quad\quad g_{0,j}=g_{m_{1},j}=h_{i,0}=h_{i,m_{2}}=0,\ g_{i,j},h_{i,j}\in\mathbb{R}\quad\forall\ 1\leq i\leq m_{1},1\leq j\leq m_{2}\Biggr{\}}.\end{split}

\begin{split}\|v\|_{G}=\inf\Biggl{\{}&\sup_{\begin{subarray}{c}1\leq i\leq m_{1}\\ 1\leq j\leq m_{2}\end{subarray}}\sqrt{(g_{i,j})^{2}+(h_{i,j})^{2}}\colon\quad v_{i,j}=g_{i,j}-g_{i-1,j}+h_{i,j}-h_{i,j-1},\quad\\ &\quad\quad g_{0,j}=g_{m_{1},j}=h_{i,0}=h_{i,m_{2}}=0,\ g_{i,j},h_{i,j}\in\mathbb{R}\quad\forall\ 1\leq i\leq m_{1},1\leq j\leq m_{2}\Biggr{\}}.\end{split}

S (x, μ, λ) = u, v \in R^{n} min J (u) + μ J^{*} (\frac{v}{μ}) + \frac{λ}{2} \frac{x - u - v}{λ}_{2}^{2} .

S (x, μ, λ) = u, v \in R^{n} min J (u) + μ J^{*} (\frac{v}{μ}) + \frac{λ}{2} \frac{x - u - v}{λ}_{2}^{2} .

⎩ ⎨ ⎧ \frac{\partial S ( x , μ , λ )}{\partial μ} + J (\nabla_{x} S (x, μ, λ)) = 0, \frac{\partial S ( x , μ , λ )}{\partial λ} + \frac{1}{2} ∥ \nabla_{x} S (x, μ, λ) ∥_{2}^{2} = 0, S (x, 0, 0) = J (x), x \in R^{n}, μ > 0, λ > 0; x \in R^{n}, μ > 0, λ > 0; x \in R^{n} .

⎩ ⎨ ⎧ \frac{\partial S ( x , μ , λ )}{\partial μ} + J (\nabla_{x} S (x, μ, λ)) = 0, \frac{\partial S ( x , μ , λ )}{\partial λ} + \frac{1}{2} ∥ \nabla_{x} S (x, μ, λ) ∥_{2}^{2} = 0, S (x, 0, 0) = J (x), x \in R^{n}, μ > 0, λ > 0; x \in R^{n}, μ > 0, λ > 0; x \in R^{n} .

q \in N_{C} (x) if and only if ⟨ q, y - x ⟩ \leq 0 for any y \in C .

q \in N_{C} (x) if and only if ⟨ q, y - x ⟩ \leq 0 for any y \in C .

C_{\infty} (x) = {d \in R^{n} : x + t d \in C for all t > 0} .

C_{\infty} (x) = {d \in R^{n} : x + t d \in C for all t > 0} .

f (α x + (1 - α) y) \leq α f (x) + (1 - α) f (y) .

f (α x + (1 - α) y) \leq α f (x) + (1 - α) f (y) .

epi f := {(x, t) : x \in dom f, t \geq f (x)} .

epi f := {(x, t) : x \in dom f, t \geq f (x)} .

f (x) = t \to 0^{+} lim f (x + t (y - x)) .

f (x) = t \to 0^{+} lim f (x + t (y - x)) .

f (y) \geq f (x) + ⟨ p, y - x ⟩, for any y \in R^{n} .

f (y) \geq f (x) + ⟨ p, y - x ⟩, for any y \in R^{n} .

⟨ p - q, x - y ⟩ \geq 0 for any p \in \partial f (x) and q \in \partial f (y) .

⟨ p - q, x - y ⟩ \geq 0 for any p \in \partial f (x) and q \in \partial f (y) .

I_{C} (x) := {0, + \infty, x \in C; x \neq \in C .

I_{C} (x) := {0, + \infty, x \in C; x \neq \in C .

\partial I_{C} (x) = N_{C} (x) .

\partial I_{C} (x) = N_{C} (x) .

f^{*} (p) := x \in R^{n} sup ⟨ p, x ⟩ - f (x) .

f^{*} (p) := x \in R^{n} sup ⟨ p, x ⟩ - f (x) .

(f □ g) (x) := u \in R^{n} in f f (u) + g (x - u) .

(f □ g) (x) := u \in R^{n} in f f (u) + g (x - u) .

{\frac{\partial S}{\partial t _{j}} + H_{j} (\nabla_{x} S) = 0 for any j \in {1, \dots, N}, S (x, 0, \dots, 0) = J (x), x \in R^{n}, t_{1}, \dots, t_{N} > 0; x \in R^{n} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On Decomposition Models in Imaging Sciences and Multi-time Hamilton-Jacobi Partial Differential Equations

Jérôme Darbon

Department of Applied Mathematics, Brown University, Providence, RI

[email protected]

and

Tingwei Meng

Department of Applied Mathematics, Brown University, Providence, RI

[email protected]

Abstract.

This paper provides new theoretical connections between multi-time Hamilton-Jacobi partial differential equations and variational image decomposition models in imaging sciences. We show that the minimal values of these optimization problems are governed by multi-time Hamilton-Jacobi partial differential equations. The minimizers of these optimization problems can be represented using the momentum in the corresponding Hamilton-Jacobi partial differential equation. Moreover, variational behaviors of both the minimizers and the momentum are investigated as the regularization parameters approach zero. In addition, we provide a new perspective from convex analysis to prove the uniqueness of convex solutions to Hamilton-Jacobi equations. Finally, we consider image decomposition models that do not have unique minimizers and we propose a regularization approach to perform the analysis using multi-time Hamilton-Jacobi partial differential equations.

The authors are listed in alphabetical order. This work was funded by NSF 1820821

1. Introduction

In the late 20th century, the Hamilton-Jacobi (HJ) equation was widely studied in the field of partial differential equations (PDEs). To be specific, the solution $S(x,t)$ defined for $x\in\mathbb{R}^{n}$ , $t\geq 0$ satisfies the following Cauchy problem

[TABLE]

where $H$ is the Hamiltonian and $J$ is the initial data. When the Hamiltonian only depends on the spatial gradient $\nabla_{x}S(x,t)$ , under some regularity and convexity assumptions, the solution is given by the Hopf formula or Lax formula [18, 68]

[TABLE]

where $J^{*}$ and $H^{*}$ are the Legendre transform of the functions $J$ and $H$ , respectively. From the physics point of view, HJ PDE describes the movement of a particle in a physics model whose energy function is given by the Hamiltonian $H$ . To be specific, the variables $x$ and $t$ are the current position and time of the particle. The characteristic line of the PDE gives the trajectory of the particle. The momentum is given by the spatial gradient $\nabla_{x}S(x,t)$ which coincides with the maximizer in the Hopf formula. The velocity is given by $\frac{x-u}{t}$ where $u$ is the minimizer in the Lax formula.

We refer the readers to the review paper [48] for thorough details and [49, 69] for connections between convex analysis and HJ equations. An extension of this PDE is to consider the time variable $t$ in a higher dimensional space $\mathbb{R}^{N}$ , in which case the PDE system is called the multi-time Hamilton-Jacobi equation, first discussed by Rochet from an economic point of view [80]. Later, Lions and Rochet [71] considered the multi-time HJ equations when the Hamiltonians are convex functions which only depend on the momentum. They proposed the generalized Hopf formula by writing it as the composition of several semigroups of the corresponding single-time HJ operators. Following their work, several existence and uniqueness results [20, 32, 73, 78, 88] were provided in more general cases, for example, when the Hamiltonians have spatial or time dependence.

It is well known that the HJ equation has a deep relationship with optimal control [26] and differential games [57, 84]. Later, Darbon [49] provided a representation formula for the minimizers of a specific kind of optimization problem, which relates the minimizers to the spatial gradients of the solutions to the HJ equations. As we will see below, many models in imaging sciences can be viewed from a perspective of HJ PDEs. Following that work, we generalize the results to multi-time HJ equations and a larger set of optimization problems, including the decomposition models in image processing.

In the past few decades, many decomposition models have been proposed in image processing. These models are applied to different practical problems, such as inpainting [23, 56], image classification [12], and road detection [62]. Here, we give a brief overview of convex variational models in this area. There are many models that cannot be fully listed here, for which we refer the readers to [44, 61].

The basic idea of image decomposition is to regard an image $x$ as a summation of several components $\{u_{j}\}$ , and solve the following minimization problem:

[TABLE]

Here, each function $f_{j}$ is designed to characterize the corresponding component $u_{j}$ . One may tune the parameters $\{\lambda_{j}\}$ to put emphasis on different components. There are many celebrated decomposition models in the literature of imaging sciences. In the introduction we mention the continuous versions of the models, while later in the main part of this paper we will work with their discrete versions. The first widely used decomposition model is the Rudin-Osher-Fatemi (ROF) model, proposed in [83], which applies the total variation (TV) semi-norm and $\|\cdot\|_{L^{2}}^{2}$ to recognize the geometry and noise in an image, respectively. In the continuous setting, for any function $u\in L^{1}(\Omega)$ and $\Omega\subset\mathbb{R}^{2}$ , the TV semi-norm of $u$ is defined by

[TABLE]

Here and after in the introduction, the derivatives and divergence are in the distribution sense. The space $BV(\Omega)$ is the space containing all functions of bounded variation, defined by

[TABLE]

Under these settings, the ROF model solves the following problem

[TABLE]

The mathematical analysis for the ROF model is provided in [1, 2, 3, 4, 7, 28, 29, 33, 34, 35, 36, 38, 40, 41, 46, 47, 51, 63, 64, 76, 79, 89, 91]. Later, Meyer [72] pointed out the disadvantage of $\|\cdot\|_{L^{2}}^{2}$ in capturing oscillating patterns. In order to overcome this disadvantage, he suggested using the norm in either of the three spaces $E,F,G$ to replace it, where these three spaces are defined as follows. We use the notations of Meyer to describe these spaces [72]. First, define the space of functions of bounded mean oscillation ( $BMO$ ) by

[TABLE]

and the homogeneous Besov space $\dot{B}_{1}^{1,1}$ by

[TABLE]

Let $\dot{B}_{\infty}^{-1,\infty}$ be the dual space of $\dot{B}_{1}^{1,1}$ . Then, define $E,F,G$ by $E:=\dot{B}_{\infty}^{-1,\infty}$ , $F:=div(BMO)$ and $G:=div(L^{\infty})$ . To be specific, the space $G$ and $G-$ norm are defined as follows

[TABLE]

The space $F$ is similarly defined by replacing the space $L^{\infty}$ in the above definition with the $BMO$ space. The corresponding models proposed by Meyer are stated as follows

[TABLE]

For mathematical analysis of these models, we refer the readers to [59, 62, 70]. In [59], the space $E$ is also generalized to any homogeneous Besov space $\dot{B}_{p}^{\alpha-2,q}$ , where $p,q\in[1,+\infty]$ and $\alpha\in(0,2)$ . However, Meyer’s models are hard to solve numerically. There are mainly two approaches to numerically solve the model with $G-$ norm. The first approach is approximating $L^{\infty}$ in the definition of $G$ by $L^{p}$ [90]. Osher et al. [77] proposed an equivalent formulation called OSV when $p=2$ . In a word, OSV uses the square of $H^{-1}-$ norm instead of $G-$ norm. To be specific, the OSV model solves

[TABLE]

The other approach called $A^{2}BC$ model is proposed by Aujol et al. [9, 10], replacing the $G-$ norm with the indicator function of balls in the space $G$ . In other words, it solves the following problem

[TABLE]

where $I\{\cdot\}$ denotes the indicator function whose definition will be given in section 2. It is shown that this $A^{2}BC$ model gives the solution to Meyer’s model eq. 2 with $X=G$ when the parameter $\mu$ is appropriately chosen. In practice, they use a Moreau-Yosida type approximation and solve the following problem instead

[TABLE]

This regularized model converges to eq. 3 as the parameter $\lambda$ approaches zero. Moreover, it is easy to implement using Chambolle’s projection method [37]. Similarly, in [11], the indicator function of the $E-$ ball is used to replace the $E-$ norm, which provides a similar numerical implementation approach to the Meyer’s model eq. 2 with $X=E$ .

In the above models, an image is decomposed into a geometrical part and an oscillating part. However, for a noisy image, the oscillating part may contain both the texture in the original image and the noise. To split these two parts, a $u+v+w$ model is proposed in [11], which constrains the $G-$ norm of the texture part and the $E-$ norm of the noisy part. Later, Gilles [60] modified the $u+v+w$ model with a coefficient assigned to each pixel to smoothly indicate whether it is in texture or noise. He also modified the $A^{2}BC$ model by requiring the $G-$ norm of the noise to be much smaller than the $G-$ norm of the texture. In [15, 53, 54], the authors extended some of the abovementioned models, which are originally proposed for gray-scale images, to color images. Besides, there are many other functions used in image decomposition. For example, the $L^{1}-$ norm [5, 14, 42, 75] is used to promote sparsity or remove salt and pepper noise. In [13, 14], the quadratic form $\langle\cdot,K\cdot\rangle$ , where $K$ is a linear symmetric positive operator, is used for adaptive kernel selection of the texture component. Note that this quadratic form generalizes the $L^{2}$ term in ROF and the $H^{-1}$ term in OSV.

The previous work [49] clarifies the relationship between single-time HJ equations and decomposition models with two terms (i.e. $N=1$ in eq. 1), such as the ROF model, Meyer’s models and some of their variations. However, as mentioned above, there are many other models handling three or more components. Also, in practice, one may modify a model by adding a quadratic term for numerical consideration, such as in eq. 4. This kind of modification is applied to most of the above models. As a result, the objective function in the numerical implementation actually contains three or more terms. On the other hand, new models can be constructed by regarding the functions mentioned above as building blocks and combining them together. For instance, the morphological component analysis [58, 85, 86] combines ROF model and $L^{1}$ minimization for the coefficients with respect to two sets of dictionaries chosen for the representation of texture and geometry. Another example is [45], which adds a higher order term $\alpha\|\Delta v\|_{L^{2}}^{2}$ to the models introduced above, in order to reduce the staircase effect. Actually, the higher order terms in image processing are widely studied in the literature. Two important models are the TV-TV2 infimal convolution model [41] and the Total Generalized Variation (TGV) model [25]. In fact, after discretization, the higher order linear operators are discretized using some matrices. In other words, the results in this paper can be applied to the discrete models with higher order terms by regarding them as matrix multiplication. In conclusion, it is valuable to generalize the previous work [49] and provide a framework to analyze the models involving more than two components. Also, our proposed framework is suitable for a large class of discrete decomposition models in imaging sciences, even including some models containing higher order terms.

Now, we briefly introduce the intuition and the basic setup for our framework and demonstrate the idea using some experimental results of the discrete $A^{2}BC$ model. In general, for a discrete decomposition model eq. 1, an image is regarded as a vector $x\in\mathbb{R}^{n}$ , where $n$ is the number of pixels. If we can relate each $f_{j}$ , $j\geq 1$ , to a Hamiltonian and $f_{0}$ to an initial function, then the minimal value, regarded as a function of the input data $x$ and the parameters $\{\lambda_{j}\}$ , relates to the solution of the corresponding multi-time HJ equation. Here, the parameters $\{\lambda_{j}\}$ are regarded as time variables.

For example, the discrete $A^{2}BC$ model solves the following optimization problem:

[TABLE]

The desired quantities are the minimizers, denoted as $u(x,\mu,\lambda)$ and $v(x,\mu,\lambda)$ . Here, the discrete total variation semi-norm $J:\ \mathbb{R}^{m_{1}\times m_{2}}\to\mathbb{R}$ is defined as follows

[TABLE]

In this paper, we identify the space $\mathbb{R}^{m_{1}\times m_{2}}$ containing all matrices with $m_{1}$ rows and $m_{2}$ columns with the Euclidean space $\mathbb{R}^{n}$ where $n=m_{1}m_{2}$ . The discrete total variation $J$ defined above is the anisotropic version, which will be used in this paper. Its Legendre transform $J^{*}$ is the indicator function of the unit ball in the dual space. To be specific, let $\|\cdot\|_{G}$ be the dual norm of $J$ , which is given by

[TABLE]

Then, we have $J^{*}(p)=I\{\|p\|_{G}\leq 1\}$ for any $p\in\mathbb{R}^{n}$ where $I\{\cdot\}$ denotes the indicator function. Notice that any indicator function is invariant under multiplication with a positive constant, then we have $\mu J^{*}=J^{*}$ . Hence, the above optimization problem is equivalent to

[TABLE]

We shall see that such a representation for $S$ will allow us to show that $S$ satisfies the following multi-time HJ equation

[TABLE]

In figs. 1, 2, 3, 4, 5, and 6, the minimizers $u,v$ and the minimal values $S$ for the corresponding input images are shown. To compute the minimizers, we apply a splitting algorithm to convert the optimization problem (5) to two subproblems involving computing the proximal point of $\lambda J$ and computing the projection to a $\mu-$ ball of Meyer’s norm. The second subproblem is the dual problem to the first one. As a result, for both subproblems, we can apply the algorithm in [39, 50, 67] to obtain the exact minimizers.

In the first example, the test image $x_{1}$ is shown in fig. 1a. We consider the following parameters $\mu_{1}=1,\lambda_{1}=0.01$ . The corresponding minimizers $u$ and $v$ are shown in figs. 1b and 1c. When $x=x_{1}$ , $\lambda=\lambda_{1}$ are fixed, the minimal values $S(x_{1},\mu,\lambda_{1})$ can be regarded as a function of $\mu$ , whose graph is plotted in fig. 2b. Similarly, the graph of $S(x_{1},\mu_{1},\lambda)$ is plotted in fig. 2c. To illustrate the variation of $S$ with respect to $x$ , we choose another image $x_{2}$ with corresponding suitable parameters $\mu_{2}$ , $\lambda_{2}$ , and plot the function values $f:\alpha\mapsto S(\alpha x_{1}+(1-\alpha)x_{2},\alpha\mu_{1}+(1-\alpha)\mu_{2},\alpha\lambda_{1}+(1-\alpha)\lambda_{2})$ with $\alpha\in[0,1]$ . In this example, $x_{2}$ is chosen to be a rotation of $x_{1}$ , and the parameters remain the same: $\mu_{2}=\mu_{1}$ , $\lambda_{2}=\lambda_{1}$ . The graph of $f$ is plotted in fig. 2a. We also show an example of the mixed image $x=\alpha x_{1}+(1-\alpha)x_{2}$ for $\alpha=0.3$ and the corresponding minimizers $u,v$ in figs. 1d, 1e, and 1f. In addition, the $A^{2}BC$ model (with parameters $\mu=0.06,\lambda=0.01$ ) is applied to a noisy image shown in fig. 3a, whose minimizers are shown in figs. 3b and 3c.

The test image “Barbara” is used in the second example. The original image and the corresponding minimizers $u,v$ in the $A^{2}BC$ model with parameters $\mu=30,\lambda=8$ are shown in fig. 4. To demonstrate the variations of the minimal values, we choose two parts $x_{1},x_{2}$ of the image, shown in figs. 5a and 5d, and repeat the experiment in the first example. Setting $\mu_{1}=16$ , $\mu_{2}=24$ , $\lambda_{1}=8$ , and $\lambda_{2}=12$ , the corresponding minimizers $u,v$ are shown in figs. 5b, 5c, 5e, and 5f. The mixed image ( $\alpha=0.5$ ) and minimizers are shown in figs. 5g, 5h, and 5i, and the dependence of $S$ on $x,\mu,\lambda$ is shown in figs. 6a, 6b, and 6c.

It can be seen from figs. 2 and 6 that $S$ is a convex function with respect to the input image $x$ and the parameters. This can be proved with a similar argument as in the proof of 3.1. In this paper, more properties about $S$ and the minimizers $u,v$ are revealed.

Our contribution. The contribution of this paper is the theoretical results connecting the multi-time HJ equation and some optimization models such as decomposition models in imaging sciences. There are three parts in this paper. In the first part, we consider the decomposition models and the corresponding dual problems, and investigate the properties of their optimizers and optimal values. To be specific, for some optimization problems, the minimal value coincides with the solution $S(x,t_{1},\cdots,t_{N})$ to a corresponding multi-time HJ equation. This relationship in the case of single-time HJ equations has been studied in [49]. We generalize the representation formula for the minimizer $u_{j}$ and the variational analysis results of $S$ and $\nabla_{x}S$ in [49] to the case of multi-time HJ equations. Moreover, we present a new variational analysis of the scaled minimizer $\frac{u_{j}}{t_{j}}$ . In the variational analysis, we consider a sequence $\{(x_{k},t_{1,k},\cdots,t_{N,k})\}_{k}$ , whose elements are perturbed variables near the point $(x,0,\cdots,0)$ and the perturbation becomes smaller when $k$ is larger. We show that the limits of the corresponding spatial gradients $\nabla_{x}S$ and the scaled minimizers $\frac{u_{j}}{t_{j}}$ solve two optimization problems which are dual to each other. In the second part, we prove the uniqueness of the convex solution to the multi-time HJ equation under some specific assumptions. In the field of PDEs, the uniqueness of the viscosity solution has been widely studied, for which we refer the readers to [48] and the references listed there. Here, our contribution is to provide a new perspective from convex analysis and use the duality technique to prove the uniqueness of the convex solution. At last, we propose a regularization method for the decomposition problems which may have non-unique minimizers or non-differentiable minimal values. The regularization method is used to select a unique minimizer $u_{\lambda,\mu}$ and a unique gradient $p_{\lambda,\mu}$ of the minimal function where $\lambda$ and $\mu$ are some positive parameters. In fact, the gradient $p_{\lambda,\mu}$ coincides with the maximizer in the corresponding dual problem. This regularization method can be regarded as a generalization of the Moreau-Yosida approximation, which is introduced, for example, in [8, 27]. Instead of only considering the primal problem as in the Moreau-Yosida approximation, our contribution here is to consider both the primal problem and the dual problem at the same time. Then, we apply the variational analysis result in the first part to prove the convergence of $u_{\lambda,\mu}$ and $p_{\lambda,\mu}$ . We show that they converge to the $l^{2}$ -projection of zero onto the corresponding sets of the original problems, when the regularization parameters $\lambda$ and $\mu$ approach zero in a comparable rate.

**Organization of the paper. **The paper is organized as follows. Section 2 gives a brief review of the convex optimization theorems which are used in the later proofs. The main results are stated in sections 3, 4, and 5. In section 3, the connection between some decomposition models and the multi-time HJ equation is shown. 3.2 provides the representation formula for the minimizers $u_{j}$ of some decomposition models. Also, we investigate the variational behaviors of the minimal value $S$ , the momentum $\nabla_{x}S$ and the velocities $\frac{u_{j}}{t_{j}}$ in 3.4. Section 4 is devoted to the proof of the uniqueness of the convex solution to the multi-time HJ equation. In section 5, we present a regularization method for the degenerate cases which do not satisfy the assumptions in section 3. The method is demonstrated using a specific example but the analysis can be easily applied to other models. Finally, some conclusions are drawn in section 6.

2. Mathematical Background

In this section, several basic definitions and theorems in convex analysis are reviewed. All the results and notations can be found in [65, 66]. We also refer the readers to [22, 24, 81].

First, a set $C$ in $\mathbb{R}^{n}$ is convex if $\alpha x+(1-\alpha)y\in C$ whenever $x,y\in C$ and $\alpha\in[0,1]$ . The relative interior of $C$ , denoted as $\mathrm{ri\ }C$ , is the interior of $C$ with respect to the minimal hyperplane containing $C$ in $\mathbb{R}^{n}$ . For any convex set $C$ , the normal cone of $C$ at $x\in C$ , denoted by $N_{C}(x)$ , can be characterized by

[TABLE]

Here, we use the angle bracket $\langle\cdot,\cdot\rangle$ to denote the inner product operator in any Euclidean space $\mathbb{R}^{n}$ . For any closed convex set $C$ and any point $x\in C$ , one can define the asymptotic cone of $C$ , denoted as $C_{\infty}(x)$ , by

[TABLE]

In fact, the asymptotic cone is independent of $x$ , as stated in the following result.

Proposition 2.1.

[65*, Prop.III.2.2.1]**

Let $C$ be a closed convex set and $x,y\in C$ . Then $C_{\infty}(x)=C_{\infty}(y)$ . In other words, for any $d\in C_{\infty}(x)$ , $y+td\in C$ for any $t>0$ .*

A function $f:\mathbb{R}^{n}\to\mathbb{R}\cup\{+\infty\}$ is said to be convex if for any $\alpha\in(0,1)$ and any $x,y\in\mathbb{R}^{n}$ ,

[TABLE]

The function $f$ is called proper if it is not identically equal to $+\infty$ . The domain of $f$ , denoted by $\mathrm{dom}~{}f$ , is defined to be the set where $f$ does not take the value $+\infty$ . The epigraph of $f$ , denoted as $\mathrm{epi~{}}f$ , is defined by:

[TABLE]

Then, $f$ is convex (proper, or lower semi-continuous, respectively) if and only if $\mathrm{epi~{}}f$ is convex (non-empty, or closed, respectively). We denote $\Gamma_{0}(\mathbb{R}^{n})$ to be the set of proper, convex and lower semi-continuous (l.s.c) functions from $\mathbb{R}^{n}$ to $\mathbb{R}\cup\{+\infty\}$ . In this section, we only consider the functions in $\Gamma_{0}(\mathbb{R}^{n})$ . These functions have good continuity properties, which are stated below.

Proposition 2.2.

[65*, Lem.IV.3.1.1 and Chap.I.3.1 - 3.2]**

Let $f\in\Gamma_{0}(\mathbb{R}^{n})$ . If $x\in\mathrm{ri\ }\mathrm{dom}~{}f$ , then $f$ is continuous at $x$ in $\mathrm{dom}~{}f$ . If $x\in\mathrm{dom}~{}f\setminus\mathrm{ri\ }\mathrm{dom}~{}f$ , then for any $y\in\mathrm{ri\ }\mathrm{dom}~{}f$ ,*

[TABLE]

For any $f\in\Gamma_{0}(\mathbb{R}^{n})$ and $x\in\mathrm{dom}~{}f$ , the directional derivative at $x$ along any direction $d$ , denoted as $f^{\prime}(x,d)$ , is well-defined in $\mathbb{R}\cup\{\pm\infty\}$ . When $f$ is differentiable at $x$ , $f^{\prime}(x,\cdot)=\langle\nabla f(x),\cdot\rangle$ is a linear function. In general, when $f$ is not differentiable, $f^{\prime}(x,\cdot)$ is only sublinear, in which case we can consider the linear functions dominated by it. Each normal vector of such linear functions gives a subgradient of $f$ at $x$ , whose formal definition is given below. Also, the rigorous statement about the relation we described above between the directional derivatives and subgradients is given in 2.6.

A vector $p$ is called a subgradient of $f$ at $x$ if it satisfies

[TABLE]

The collection of all such subgradients is called the subdifferential of $f$ at $x$ , denoted as $\partial f(x)$ . It is easy to check that $0\in\partial f(x)$ if and only if $x$ is a minimizer of $f$ . As a result, one can check whether $x$ is a minimizer by computing the subdifferential.

As is well known, the subdifferential operator is a (maximal) monotone operator. To be specific,

[TABLE]

Moreover, in most cases, the subdifferential operator commutes with summation.

Proposition 2.3.

[66*, Cor.XI.3.1.2]**

Let $f,g\in\Gamma_{0}(\mathbb{R}^{n})$ . Assume $\mathrm{ri\ }\mathrm{dom}~{}f\cap\mathrm{ri\ }\mathrm{dom}~{}g\neq\emptyset$ . Then $\partial(f+g)(x)=\partial f(x)+\partial g(x)$ for any $x\in\mathrm{dom}~{}f\cap\mathrm{dom}~{}g$ .*

Here, we give one simple example. For any convex set $C$ , the indicator function $I_{C}$ is defined by

[TABLE]

In this paper, we also use the notation $I\{\cdot\}$ to denote the indicator function if the set $C$ is given in the form of some constraints. By definition, the indicator function $I_{C}$ remains the same after multiplying by a positive constant, i.e. we have $\alpha I_{C}=I_{C}$ for any $\alpha>0$ . One can compute the subdifferential of the indicator function and obtain

[TABLE]

Next, we introduce one important transform in convex analysis called Legendre transform. For any function $f\in\Gamma_{0}(\mathbb{R}^{n})$ , the Legendre transform of $f$ , denoted as $f^{*}$ , is defined by

[TABLE]

Legendre transform gives a duality relationship between $f$ and $f^{*}$ . In other words, if $f\in\Gamma_{0}(\mathbb{R}^{n})$ , then $f^{*}\in\Gamma_{0}(\mathbb{R}^{n})$ and $f^{**}=f$ . Similarly, along with this duality relationship, some properties are dual to others, as stated in the following proposition. (Here and after, a function $g$ is called 1-coercive if $\lim_{\|x\|\to+\infty}g(x)/\|x\|=+\infty$ .)

Proposition 2.4.

[66*, Chap.X.4.1]**

Let $f\in\Gamma_{0}(\mathbb{R}^{n})$ . Then $f$ is finite-valued if and only if $f^{*}$ is 1-coercive. Also, $f$ is differentiable if and only if $f^{*}$ is strictly convex.*

In particular, the subgradients can be characterized by the maximizers in eq. 11.

Proposition 2.5.

[66*, Cor.X.1.4.4]**

Let $f\in\Gamma_{0}(\mathbb{R}^{n})$ and $p,x\in\mathbb{R}^{n}$ . Then $p\in\partial f(x)$ if and only if $x\in\partial f^{*}(p)$ , if and only if $f(x)+f^{*}(p)=\langle p,x\rangle$ .*

The concepts we introduced above, including directional derivatives, subgradients and Legendre transform, can be linked all together by the following proposition.

Proposition 2.6.

[66*, Example X.2.4.3]**

Let $f\in\Gamma_{0}(\mathbb{R}^{n})$ and $x\in\mathrm{dom}~{}f$ such that $\partial f(x)$ is nonempty, then $(f^{\prime}(x,\cdot))^{*}=I_{\partial f(x)}$ . Moreover, if $x\in\mathrm{ri\ }\mathrm{dom}~{}f$ , then $f^{\prime}(x,\cdot)\in\Gamma_{0}(\mathbb{R}^{n})$ , hence $f^{\prime}(x,\cdot)=I_{\partial f(x)}^{*}$ .*

Except from Legendre transform, there is another operator to construct convex functions called inf-convolution. Given two functions $f,g\in\Gamma_{0}(\mathbb{R}^{n})$ , assume there exists an affine function $l$ such that $f(x)\geq l(x)$ and $g(x)\geq l(x)$ for any $x\in\mathbb{R}^{n}$ . Then, the inf-convolution between $f$ and $g$ , denoted as $f\square g$ , is a convex function taking values in $\mathbb{R}\cup\{+\infty\}$ . The definition of the inf-convolution $f\square g$ is given by

[TABLE]

In the following proposition, the relation between Legendre transform and inf-convolution is stated. Actually, the Hopf formula and Lax formula introduced in the next section are formulated using Legendre transform and inf-convolution operator, respectively. As a result, these two operators play a significant role in our analysis in this paper.

Proposition 2.7.

[66*, Thm.X.2.3.2 and Thm.XI.3.4.1]**

Let $f,g\in\Gamma_{0}(\mathbb{R}^{n})$ . Assume the intersection of $\mathrm{ri\ }\mathrm{dom}~{}f^{*}$ and $\mathrm{ri\ }\mathrm{dom}~{}g^{*}$ is non-empty. Then $f\square g\in\Gamma_{0}(\mathbb{R}^{n})$ and $f\square g=(f^{*}+g^{*})^{*}$ . Moreover, for any $x\in\mathrm{dom}~{}f\square g$ , the optimization problem eq. 12 has at least one minimizer, and $\partial(f\square g)(x)=\partial f(u)\cap\partial g(x-u)$ for any minimizer $u$ .*

3. Properties of the Solutions to the Multi-time Hamilton-Jacobi Equations

In this section, we provide a representation formula for the minimizers in the Lax formula and highlight the relation of the minimizers and the momentum in the multi-time HJ equation. Also, we investigate the variational behaviors of both the solution to the multi-time HJ equation and the corresponding momentum when time variables approach zero. Moreover, we also present a new result stating the variational behaviors of the velocities, which has not been developed before, even for the single-time case. Similar to the duality relation of the Hopf and Lax formulas, the cluster points of the minimizers and momentum solve two optimization problems, which are also dual to each other. An illustration is given in the upper part of fig. 7.

We consider the solution $S(x,t_{1},\cdots,t_{N})$ to the following multi-time HJ equation

[TABLE]

Here, we only consider the multi-time HJ equations whose Hamiltonians only depend on the momentum $\nabla_{x}S$ . Several conditions are imposed on the Hamiltonians $\{H_{j}\}$ and the initial data $J$ in this section. To be specific, we assume

(H1)

$H_{j}:\mathbb{R}^{n}\to\mathbb{R}$ , is convex and 1-coercive for any $j=1,\cdots,N$ . Moreover, at least one of them is strictly convex;

(H2)

$J\in\Gamma_{0}(\mathbb{R}^{n})$ .

From the assumption (H1), by 2.4, it is known that $H_{j}^{*}$ is also finite-valued, convex and 1-coercive for any $j=1,\cdots,N$ . Moreover, at least one $H_{j}^{*}$ is differentiable.

It is well known that in this case the unique classical solution is given by the Hopf formula [71, 88] stated as follows

[TABLE]

and the Lax formula [88] stated as follows

[TABLE]

for any $x\in\mathbb{R}^{n}$ and $t_{1},\cdots,t_{N}\geq 0$ . We extend $S_{H}$ and $S_{L}$ to the whole domain by simply setting the function values to $+\infty$ whenever the function value is not defined. There are some physical interpretations of the HJ PDEs and the optimizers in the above two formulas. Given suitable Hamiltonians $\{H_{j}\}$ and a suitable initial condition $J$ , the HJ PDE eq. 13 describes the movement of a particle. Roughly speaking, in a time interval with length $t_{j}$ , a particle moves along the characteristic line of the $j-$ th equation in the PDE system. The velocity in this time interval equals $\frac{u_{j}}{t_{j}}$ where $(u_{1},\cdots,u_{N})$ denotes the minimizer in the Lax formula eq. 15. On the other hand, the maximizer in the Hopf formula eq. 14 gives the momentum of the particle, which coincides with the spatial gradient $\nabla_{x}S(x,t_{1},\cdots,t_{N})$ . We refer the reader to [21] for details about HJ PDEs and variational principles in physics.

Under the assumptions (H1) and (H2), $S_{H}=S_{L}$ , and the value is finite if there exists some $t_{j}>0$ . In addition, the minimizers in the Lax formula eq. 15 exist whenever the minimal value is finite. This result can be proved using 2.7. Also, by 2.5, it is not hard to check $S_{H}\in C^{1}(\mathbb{R}^{n}\times(0,+\infty)^{N})$ and satisfies HJ equation eq. 13. Moreover, the spatial gradient is the unique maximizer in the Hopf formula eq. 14. To conclude, the Hopf and Lax formulas express the classical solution to the multi-time HJ equation as two optimization problems. The Hopf formula provides a physical interpretation and has the momentum $\nabla_{x}S$ as the maximizer, while its dual problem in the Lax formula is in the same form as some decomposition models in imaging sciences.

The following proposition states that the solution is actually a convex function, hence the techniques in convex analysis can be applied to analyze the solution. The results hold even under weaker assumptions. Actually, a part of the proposition can be further generalized to the case when $J,H_{j}\in\Gamma_{0}(\mathbb{R}^{n})$ and $\mathrm{dom}~{}J^{*}\subseteq\mathrm{dom}~{}H_{j}$ for any $j$ .

Proposition 3.1.

Let $J,H_{j}\in\Gamma_{0}(\mathbb{R}^{n})$ and $\mathrm{dom}~{}H_{j}=\mathbb{R}^{n}$ for any $j$ . Then, $S_{H}\in\Gamma_{0}(\mathbb{R}^{n+N})$ , whose Legendre transform is given by

[TABLE]

for any $p\in\mathbb{R}^{n}$ and $E^{-}=(E_{1}^{-},\cdots,E_{N}^{-})\in\mathbb{R}^{N}$ . Here, $I\{\cdot\}$ denotes the indicator function. Moreover, if the assumptions (H1)-(H2) are satisfied, then $S_{H}(x,t_{1},\cdots,t_{N})$ is finite for any $x\in\mathbb{R}^{n}$ and $t_{1},\cdots,t_{N}\geq 0$ which are not all zero.

Proof.

First, we prove that $S_{H}$ is the Legendre transform of $F$ , where $F$ is defined by

[TABLE]

for any $p\in\mathbb{R}^{n}$ and any $E^{-}=(E_{1}^{-},\cdots,E_{N}^{-})\in\mathbb{R}^{N}$ . It is easy to check $F\in\Gamma_{0}(\mathbb{R}^{n+N})$ .

By definition, for any $x\in\mathbb{R}^{n}$ and $t=(t_{1},\cdots,t_{N})\in\mathbb{R}^{N}$ ,

[TABLE]

First, we consider the case when there exists $k$ such that $t_{k}<0$ . Take $p\in\mathrm{dom}~{}J^{*}$ . For any $j\neq k$ , take $E_{j}^{-}=-H_{j}(p)$ , which is a finite value. From the above equation,

[TABLE]

Hence $F^{*}(x,t)=+\infty=S_{H}(x,t)$ if $t_{k}<0$ for some $k$ .

Then, consider the case when $t_{1},\cdots,t_{N}\geq 0$ . Let $x\in\mathbb{R}^{n}$ , from eq. 16, we obtain

[TABLE]

Therefore, $S_{H}=F^{*}$ , which implies $S_{H}$ is a convex lower semi-continuous function and $F=S_{H}^{*}$ . Moreover, if there exists some $k$ such that $t_{k}>0$ and $t_{j}\geq 0$ for any $j\neq k$ , then, by assumption (H1), we deduce that $J^{*}+\sum_{j}t_{j}H_{j}$ is 1-coercive, which, by 2.4, implies its Legendre transform $S_{H}(\cdot,t_{1},\cdots,t_{N})$ (with respect to x) is finite-valued. ∎

By investigating $S_{H}$ on the boundary of the domain, the solution to a lower time dimensional equation is embedded in the solution to the higher time dimensional equation, in the sense that the restriction of $S_{H}$ on the subspace $\{(x,t_{1},\cdots,t_{N}):\ t_{j}=0\ \forall j\in J\}$ for any index set $J\subset\{1,\cdots,N\}$ is the solution to the corresponding lower time dimensional HJ equation with Hamiltonians $\{H_{j}\}_{j\not\in J}$ .

The following proposition states a representation formula for the minimizers in the Lax formula. In the decomposition model eq. 15, a given image $x$ is decomposed into different components including $u_{1},\cdots,u_{N}$ and the residual $x-\sum_{j=1}^{N}u_{j}$ . However, sometimes the primal minimization problem is difficult to solve, then the following proposition can be applied to compute $(u_{1},\cdots,u_{N})$ using the momentum $\nabla_{x}S_{L}(x,t_{1},\cdots,t_{N})$ . In fact, the momentum is the maximizer of the dual problem in the Hopf formula eq. 14. In other words, the following proposition gives the relation of the optimizers in the primal decomposition problem and the dual problem.

Proposition 3.2.

Suppose the assumptions (H1)-(H2) hold. Let $x\in\mathbb{R}^{n},t_{1},\cdots,t_{N}\geq 0$ and assume the time variables $\{t_{j}\}$ are not all zero. Denote $(u_{1},\cdots,u_{N})$ to be any minimizer of the minimization problem in eq. 15 with parameters $x$ and $t_{1},\cdots,t_{N}$ . Here, each $u_{j}$ can be regarded as a function of $(x,t_{1},\cdots,t_{N})$ . Then, for any $j$ ,

[TABLE]

Specifically, if a stronger assumption is imposed, say, all the Hamiltonians are differentiable, then the minimizer $(u_{1},\cdots,u_{N})$ is unique and satisfies

[TABLE]

Proof.

Since $\mathrm{dom}~{}H_{j}=\mathbb{R}^{n}$ for each $j$ , by 2.7 and induction, the minimizers $u_{j}$ exist if $S_{L}(x,t_{1},\cdots,t_{N})<+\infty$ , and

[TABLE]

From the assumption (H1), there exists some $j$ such that $H_{j}^{*}$ is differentiable, hence the intersection above contains at most one element. On the other hand, $\partial_{x}S_{L}$ is non-empty in the interior of the domain of $S_{L}(\cdot,t_{1},\cdots,t_{N})$ , which is the whole space $\mathbb{R}^{n}$ because $S_{L}=S_{H}$ is finite-valued when the time variables are not all zero. Therefore, the above intersection contains exactly one element. In other words, $S_{L}$ is differentiable with respect to $x$ for any $t_{1},\cdots,t_{N}\geq 0$ which are not all zero and $x\in\mathbb{R}^{n}$ . Moreover, by eq. 19, $\nabla_{x}S_{L}\in\partial H_{j}^{*}(u_{j}/t_{j})$ , which implies $u_{j}/t_{j}\in\partial H_{j}(\nabla_{x}S_{L}(x,t_{1},\cdots,t_{N}))$ for any $j$ . ∎

In the remaining part of this section, we investigate the multi-time HJ equation eq. 13 and the minimization problem eq. 15 in a variational point of view. To be specific, let $v_{j,k}\in\mathbb{R}^{n}$ and $t_{j,k}>0$ for any $j\in\{1,\cdots,N\}$ and $k\in\mathbb{N}$ such that they satisfy $\lim_{k\to+\infty}t_{j,k}=0$ and $\lim_{k\to+\infty}v_{j,k}=v_{j,\infty}$ for any $j$ . Let $x\in\mathbb{R}^{n}$ and $x_{k}=x+\sum_{j=1}^{N}t_{j,k}v_{j,k}$ for any $k$ . We are interested in the convergence behavior of the momentum $\nabla_{x}S_{H}$ and the minimizers $u_{j}$ evaluated at $(x_{k},t_{1,k},\cdots,t_{N,k})$ . We will demonstrate one application in section 5.

Among all the sequences $\{t_{j,k}\}_{k},j=1,\cdots,N$ , by taking subsequences, we can assume there is a sequence with the lowest convergence rate. According to the symmetry of the time variables, without loss of generality, we can assume $\{t_{1,k}\}_{k}$ is the slowest sequence converging to zero compared to $\{t_{j,k}\}_{k}$ for any $j>1$ , i.e., we assume that $\left\{\frac{t_{j,k}}{t_{1,k}}\right\}_{k}$ has a finite limit denoted as $\alpha_{j,\infty}\in\mathbb{R}$ for any $j$ . In summary, the following notations and assumptions are adopted:

[TABLE]

In the decomposition models, $\{x_{k}\}$ is given by a sequence of observed images. In each $x_{k}$ there is a constant component denoted by $x$ and several other components denoted by $t_{j,k}v_{j,k}$ for $j=1,\cdots,N$ . In the remaining part of this section, we investigate the behavior of the minimizers of the decomposition model in eq. 15 when the components $t_{j,k}v_{j,k}$ converge to zero and the parameters $t_{j,k}$ in the model vanish.

First, we show the convergence of $u_{j}$ to zero , which is stated in (i) in the following proposition. In other words, the decomposition model recovers the constant component $x$ when the other components $t_{j,k}v_{j,k}$ and the parameters $t_{j,k}$ in the model converge to zero. Then, (ii) and (iii) in the following proposition are technical results about the convergence rate, which will be used in later proofs.

Proposition 3.3.

Assume (H1)-(H2) and eq. 20 hold. Let $(u_{1},\cdots,u_{N})$ be any minimizer of the minimization problem in eq. 15. Let $x\in\mathrm{dom}~{}J$ . Then,

(i)

For any $j=1,\cdots,N$ ,

[TABLE]

(ii)

If $\partial J(x)\neq\emptyset$ and $\alpha_{j,\infty}=0$ , then

[TABLE]

(iii)

If $\partial J(x)\neq\emptyset$ and $\alpha_{j,\infty}\neq 0$ , then the sequence $\left\{\frac{1}{t_{j,k}}u_{j}(x_{k},t_{1,k},\cdots,t_{N,k})\right\}_{k}$ is bounded.

Proof.

Denote $\bar{u}_{j,k}:=u_{j}(x_{k},t_{1,k},\cdots,t_{N,k})$ for any $j=1,\cdots,N$ , and $\bar{u}_{0,k}:=x_{k}-\sum_{j=1}^{N}\bar{u}_{j,k}$ . Define $I:=\{j:\{\|\bar{u}_{j,k}\|/t_{j,k}\}_{k}\text{ is not bounded}\}$ . Recall that for each $j=1,\cdots,N$ , $\{v_{j,k}\}_{k}\subset\mathbb{R}^{n}$ and $\{t_{j,k}\}_{k}\subset(0,+\infty)$ are two sequences satisfying $\lim_{k\to+\infty}v_{j,k}=v_{j,\infty}$ and $\lim_{k\to+\infty}t_{j,k}=0$ , respectively. And the $k-th$ spatial variable $x_{k}$ is defined to be $x+\sum_{j=1}^{N}t_{j,k}v_{j,k}$ .

Proof of (i): By Lax formula eq. 15,

[TABLE]

Since $J$ is a convex function, there exists $z\in\mathrm{dom}~{}J$ such that $\partial J(z)\neq\emptyset$ . Let $q\in\partial J(z)$ . Then, using the convexity of $J$ and Cauchy-Schwarz inequality, we get

[TABLE]

Combining eq. 22 and eq. 23, we get

[TABLE]

For any $j\in I$ , since $\|\bar{u}_{j,k}\|/t_{j,k}$ is not bounded, without loss of generality, by taking subsequences, we can assume $\|\bar{u}_{j,k}\|/t_{j,k}$ increases to infinity. Since $H_{j}^{*}$ is 1-coercive, for any $M>0$ , there exists $K$ such that for any $k>K$ , $H_{j}^{*}(\bar{u}_{j,k}/t_{j,k})\geq M\|\bar{u}_{j,k}\|/t_{j,k}$ . Together with eq. 24, we get

[TABLE]

Since $\{t_{j,k}\}_{k}$ and $\{v_{j,k}\}_{k}$ are bounded, and $H_{j}^{*}$ is continuous in $\mathbb{R}^{n}$ for any $j$ , then the right hand side is bounded. However, $M$ can be arbitrarily large, then the boundedness of left hand side (deduced by the boundedness of the right hand side) implies $\|\bar{u}_{j,k}\|\to 0$ for any $j\in I$ . If $j\not\in I$ , then $\|\bar{u}_{j,k}\|/t_{j,k}$ is bounded by the definition of $I$ , hence $\bar{u}_{j,k}$ also converges to zero.

Proof of (ii): We can apply the same argument as above and set $z=x$ , because $\partial J(x)\neq\emptyset$ . From eq. 25, using the definition of $x_{k}$ in eq. 20 and triangle inequality, we have

[TABLE]

Dividing both sides by $t_{1,k}$ , we can obtain

[TABLE]

With the same argument as in the proof of (i), we deduce that the right hand side is bounded, while $M$ can be arbitrarily large. Therefore, $\|\bar{u}_{j,k}\|/t_{1,k}$ converges to zero for any $j\in I$ . If $j\not\in I$ and $\alpha_{j,\infty}=0$ , then $\|\bar{u}_{j,k}\|/t_{j,k}$ is bounded by the definition of $I$ and $t_{j,k}/t_{1,k}$ converges to zero by the definition of $\alpha_{j,\infty}$ , hence $\|\bar{u}_{j,k}\|/t_{1,k}$ also converges to zero.

Proof of (iii): It suffices to prove the contrapositive statement. To be specific, let $j\in I$ , i.e. $\|\bar{u}_{j,k}\|/t_{j,k}$ is unbounded, it suffices to prove $\alpha_{j,\infty}=0$ . In the proof of (ii), we know that $\|\bar{u}_{j,k}\|/t_{1,k}$ converges to zero if $j\in I$ . Then, the unboundedness of $\{\bar{u}_{j,k}/t_{j,k}\}_{k}$ implies that $t_{j,k}/t_{1,k}$ converges to [math], hence $\alpha_{j,\infty}=0$ and (iii) is proved. ∎

Similarly, we also consider the maximizers $\nabla_{x}S_{H}$ in the dual problem eq. 14 with the observed data $x_{k}$ and the parameters $\{t_{j,k}\}_{j=1}^{N}$ . The following lemma states the boundedness of the maximizers $\{\nabla_{x}S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})\}_{k}$ which will be used in the later proofs.

Lemma 3.1.

Under the assumptions (H1)-(H2) and eq. 20, for any $x\in\mathrm{dom}~{}J$ such that $\partial J(x)\neq\emptyset$ , the sequence $\{\nabla_{x}S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})\}_{k}$ is bounded and any cluster point $p$ is in $\partial J(x)$ .

Proof.

Recall that for each $j\in\{1,\cdots,N\}$ , $\{v_{j,k}\}_{k}\subset\mathbb{R}^{n}$ and $\{t_{j,k}\}_{k}\subset(0,+\infty)$ are two sequences satisfying the assumptions in eq. 20. Denote $p_{k}:=\nabla_{x}S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})$ . Then, $p_{k}$ is a maximizer of the maximization problem in eq. 14. Hence, for any $q$ in $\partial J(x)$ ,

[TABLE]

Since $q\in\partial J(x)$ , we have $x\in\partial J^{*}(q)$ , hence $J^{*}(p_{k})\geq J^{*}(q)+\langle x,p_{k}-q\rangle$ . Combining this inequality and the above one we can obtain

[TABLE]

Here, for the second inequality above, we used the definition of $x_{k}$ in eq. 20 and Cauchy-Schwarz inequality. Then, rearranging the terms and dividing by $t_{1,k}$ , we get

[TABLE]

If $\{p_{k}\}_{k}$ is not bounded, without loss of generality, we can assume $\|p_{k}\|$ increases to infinity. Since $H_{j}$ is 1-coercive for all $j$ , then for any $M>0$ , there exists $K$ such that $H_{j}(p_{k})\geq M\|p_{k}\|$ for any $k>K$ and any $j=1,\cdots,N$ . Then, from eq. 26, for any $k>K$ , we obtain

[TABLE]

The right hand side is bounded. However, since $\|p_{k}\|$ goes to infinity, the term for $j=1$ on the left hand side is unbounded, while the terms for $j>1$ is non-negative. As a result, the left hand side can be arbitrarily large, which leads to a contradiction. Therefore, we can conclude that $\{p_{k}\}_{k}$ is bounded.

For the remaining part, let $p$ be a cluster point, then there exists a subsequence converging to $p$ , still denoted as $p_{k}$ . Since $S_{H}$ solves the multi-time HJ equation eq. 13 and $H_{j}$ is continuous for any $j$ , then we have

[TABLE]

By the continuity property [66, Prop.XI.4.1.1] of the subdifferential operator $\partial S_{H}$ of the convex lower semi-continuous function $S_{H}$ , we can conclude that

[TABLE]

which implies $p\in\partial J(x)$ . ∎

The variational behaviors of the momentum $\nabla_{x}S$ and the velocities $u_{j}/t_{j}$ are presented in the following proposition. To be specific, the cluster points of the momenta and the velocities solve two optimization problems, respectively, and the two problems are dual to each other. An illustration of this result is given in fig. 7.

Proposition 3.4.

Assume (H1)-(H2) and eq. 20 hold. Let $x\in\mathrm{dom}~{}J$ and $\partial J(x)\neq\emptyset$ . Then,

(i)

the directional derivative of $S_{H}$ corresponds to a maximization problem:

[TABLE]

Moreover, let $p$ be any cluster point of $\{\nabla_{x}S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})\}_{k}$ , then,

[TABLE]

(ii)

the directional derivative of $S_{L}$ corresponds to the dual minimization problem:

[TABLE]

Moreover, if $\bar{w}_{j}$ is a cluster point of $\{u_{j}(x_{k},t_{1,k},\cdots,t_{N,k})/t_{j,k}\}_{k}$ for any $j$ satisfying $\alpha_{j,\infty}\neq 0$ , then

[TABLE]

Specially, if $H_{j}$ is strictly convex and $\alpha_{j,\infty}\neq 0$ for some $j$ , then the maximizer in eq. 28 is unique, which implies the convergence of $\nabla_{x}S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})$ to the unique maximizer. Similarly, for any $j$ such that $H_{j}$ is differentiable and $\alpha_{j,\infty}\neq 0$ , we can conclude that $u_{j}(x_{k},t_{1,k},\cdots,t_{N,k})/t_{j,k}$ converges to the unique minimizer in eq. 30.

Remark 3.1.

It is straightforward to obtain $\lim_{k\to+\infty}\frac{S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})-S_{H}(x,0,\cdots,0)}{\|(t_{1,k},\cdots,t_{N,k})\|_{2}}$ using the following computation

[TABLE]

where the last equality follows from the assumption that $\alpha_{j,\infty}=\lim_{k\to+\infty}t_{j,k}/t_{1,k}$ for any $j=1,\cdots,N$ .

Proof.

Recall that the $k-th$ spatial variable $x_{k}$ is defined to be $x+\sum_{j=1}^{N}t_{j,k}v_{j,k}$ , where $\{v_{j,k}\}_{k}\subset\mathbb{R}^{n}$ and $\{t_{j,k}\}_{k}\subset(0,+\infty)$ are two sequences satisfying the assumptions in eq. 20. Denote $\Delta S_{k}:=S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})-S_{H}(x,0,\cdots,0)$ .

Proof of (i): For any $q\in\partial J(x)$ , by Hopf formula eq. 14, we obtain

[TABLE]

Since $q\in\partial J(x)$ , we have $J^{*}(q)+J(x)=\langle q,x\rangle$ . Hence, together with the definition of $x_{k}$ in eq. 20, we get

[TABLE]

Therefore, we have

[TABLE]

where we recall that $\lim_{k\to+\infty}v_{j,k}=v_{j,\infty}$ and $\lim_{k\to+\infty}t_{j,k}/t_{1,k}=\alpha_{j,\infty}$ by eq. 20. Here, $q$ is an arbitrary element in $\partial J(x)$ , hence we obtain

[TABLE]

On the other hand, for any $k$ , consider the function $\phi_{k}:[0,+\infty)\to\mathbb{R}$ defined by $\phi_{k}(t):=S_{H}\left(x+\sum_{j=1}^{N}t\alpha_{j,k}v_{j,k},\ \alpha_{1,k}t,\cdots,\ \alpha_{N,k}t\right)$ , where $\alpha_{j,k}:=t_{j,k}/t_{1,k}$ . Since $S_{H}$ is a convex function and $\phi_{k}$ is its restriction on a line, then $\phi_{k}\in\Gamma_{0}(\mathbb{R})$ with $\mathrm{dom}~{}\phi_{k}=[0,+\infty)$ . Also, $\phi_{k}$ is differentiable in $(0,+\infty)$ since $S_{H}$ is differentiable. The derivative of $\phi_{k}$ at $t_{1,k}$ is given by the chain rule:

[TABLE]

Since $S_{H}$ satisfies the multi-time HJ equation eq. 13, we obtain

[TABLE]

From straightforward computation and the convexity of $\phi_{k}$ , we get

[TABLE]

where $p_{k}:=\nabla_{x}S_{H}(x_{k},t_{1,k},\cdots,t_{N,k})$ .

Let $p$ be a cluster point of $\{p_{k}\}$ . Take a subsequence converging to $p$ and still denote it as $\{p_{k}\}$ . Since $p\in\partial J(x)$ by 3.1 and $H_{j}$ is continuous for any $j$ , we have

[TABLE]

Together with eq. 31, the equation eq. 27 is proved. Moreover, any cluster point $p$ is a maximizer.

Proof of (ii): Here, we adopt the notations $\bar{u}_{j,k}$ and $\bar{u}_{0,k}$ defined in the proof of 3.3 to represent the minimizers in the Lax formula. According to the Lax formula eq. 15 evaluated at the point $(x_{k},t_{1,k},\cdots,t_{N,k})$ and by the convexity of $J$ we deduce that

[TABLE]

for any $q\in\partial J(x)$ . Since $S_{L}=S_{H}$ , we have $S_{L}(x_{k},t_{1,k},\cdots,t_{N,k})-S_{L}(x,0,\cdots,0)=\Delta S_{k}$ . By the definition of $x_{k}$ and $\bar{u}_{0,k}$ , we can compute $\bar{u}_{0,k}-x=x_{k}-x-\sum_{j}\bar{u}_{j,k}=\sum_{j}(t_{j,k}v_{j,k}-\bar{u}_{j,k})$ , hence we have

[TABLE]

where $\alpha_{j,k}:=t_{j,k}/t_{1,k}$ . According to 3.2 we have $\bar{u}_{j,k}/t_{j,k}\in\partial H_{j}(p_{k})$ . Therefore we get

[TABLE]

Combining the above two equations we obtain

[TABLE]

From 3.3 (ii), $\|\bar{u}_{j,k}\|/t_{1,k}$ converges to zero if $\alpha_{j,\infty}=0$ . Also, $p_{k}$ are bounded by 3.1, hence the first sum in the right hand side of eq. 33 converges to zero as $k$ approaches infinity. On the other hand, for $j$ such that $\alpha_{j,\infty}\neq 0$ , $\bar{u}_{j,k}/t_{j,k}$ is bounded by 3.3 (iii). Taking a subsequence, we can assume that $\bar{u}_{j,k}/t_{j,k}$ converges to some vector, denoted as $\bar{w}_{j}$ . In conclusion, as $k$ approaches infinity in eq. 33, we have

[TABLE]

where the second inequality holds by the definition of Legendre transform eq. 11. From eq. 27, for any maximizer $p$ in eq. 28,

[TABLE]

Taking $q=p$ in eq. 34 and comparing it with eq. 35, we can conclude that the inequalities in eq. 34 become equalities when $q=p$ . As a result, when $\alpha_{j,\infty}\neq 0$ we have $\langle p,\bar{w}_{j}\rangle=H_{j}^{*}(\bar{w}_{j})+H_{j}(p)$ , which implies that $p\in\partial H_{j}^{*}(\bar{w}_{j})$ . Then, we deduce that

[TABLE]

On the other hand, for an arbitrary $q\in\partial J(x)$ , by eq. 34 and eq. 36, we have

[TABLE]

which implies that $\langle p-q,v_{j,\infty}-\bar{w}_{j}\rangle\geq 0$ for any $q\in\partial J(x)$ , when $\alpha_{j,\infty}\neq 0$ . By eq. 7 and eq. 10, we can deduce that $v_{j,\infty}-\bar{w}_{j}\in N_{\partial J(x)}(p)=\partial I_{\partial J(x)}(p)$ . 2.5 gives the equality $\langle p,v_{j,\infty}-\bar{w}_{j}\rangle=I_{\partial J(x)}^{*}(v_{j,\infty}-\bar{w}_{j})$ . Then, eq. 29 follows from this equality and eq. 36.

It remains to prove eq. 30. Consider any $j$ such that $\alpha_{j,\infty}\neq 0$ . Define $f:\mathbb{R}^{n}\to\mathbb{R}$ by $f(w):=I_{\partial J(x)}^{*}(v_{j,\infty}-w)+H_{j}^{*}(w)$ . Then it suffices to prove $0\in\partial f(\bar{w}_{j})$ . So far, we have proved $p\in\partial H_{j}^{*}(\bar{w}_{j})$ and $v_{j,\infty}-\bar{w}_{j}\in\partial I_{\partial J(x)}(p)$ , which implies $p\in\partial I_{\partial J(x)}^{*}(v_{j,\infty}-\bar{w}_{j})$ . By straightforward computation and 2.3,

[TABLE]

Therefore, $\bar{w}_{j}$ is a minimizer of $f$ , which concludes the proof. ∎

The above proposition provides the explicit formulas for the variations of $S$ , $\nabla_{x}S$ and $\frac{u_{j}}{t_{j}}$ where $u_{j}$ denotes the $j$ -th component of the minimizer of the decomposition model in the form of eq. 15. Specifically, the limits of these quantities are related to the two optimization problems given by eqs. 28 and 30. From the perspective of image processing, given an observed image $x_{k}$ which is a summation of a constant component $x$ and other components $t_{j,k}v_{j,k}$ , the decomposition model eq. 15 gives $N+1$ components. In these $N+1$ components, one component converges to the constant component $x$ and the other components $u_{j}$ vanish as the parameters $t_{j,k}$ approach zero, by 3.3. Then, 3.4(ii) states that the component $u_{j}$ converges to [math] from a direction $\bar{w}_{j}$ [82, p. 197]. On the other hand, 3.4(i) provides a representation formula for the cluster point of the maximizers of the dual problem in the form of eq. 14.

4. Uniqueness of the Convex Solutions to the Multi-time Hamilton-Jacobi Equations

In the previous section, we have discussed the relation of the optimization problems in the Hopf formula and Lax formula with the classical solution of the multi-time HJ equation. In fact, some results can be generalized to weaker assumptions in which case the solution provided by Hopf and Lax formulas is not classical. In this section, we prove that the only convex solution is given by the two formulas.

In the field of PDEs, a type of solution called viscosity solution is considered for solving the HJ equation when no classical solution exists. The uniqueness of the viscosity solution has been widely studied under different assumptions [17, 19]. However, the functions in convex analysis and optimization may take the value $+\infty$ , which is an unusual condition in the field of PDEs. Therefore, to maintain the connection of the HJ equations and convex optimization problems, we consider the convex solution which may be infinity in some area and prove the uniqueness using the techniques in convex analysis.

We start with the proof for the classical convex solution, in order to demonstrate the idea of utilizing the convexity assumptions. After that, we state the uniqueness of nonsmooth convex solution under more general assumptions in 4.1. When proving the uniqueness of the classical convex solution, we assume the properties (H1) and (H2) hold. Moreover, the solution $S$ satisfies:

(S1)

$S\in\Gamma_{0}\left(\mathbb{R}^{n}\times[0,+\infty)^{N}\right)\cap C^{1}(\mathbb{R}^{n}\times(0,+\infty)^{N})$ ;

(S2)

$S$ solves the multi-time Hamilton-Jacobi equation eq. 13.

As it is discussed in section 3, $S_{H}$ defined in the Hopf formula eq. 14 is a solution satisfying the assumptions (S1) and (S2). Hence, we just need to prove $S=S_{H}$ for any $S$ satisfying (S1)-(S2). First, we consider the single-time case when the time dimension $N=1$ , and formulate its Legendre transform $S^{*}(p,E^{-})$ for $p\in\mathbb{R}^{n}$ and $E^{-}\in\mathbb{R}$ in the following lemma.

Lemma 4.1.

Assume (H1)-(H2) hold and $S$ satisfies (S1)-(S2). Let $N=1$ . Then there exists a convex function $\tilde{H}:\ \mathbb{R}^{n}\to\mathbb{R}\cup\{+\infty\}$ , such that $S^{*}(p,E^{-})=J^{*}(p)+I_{V}(p,E^{-})$ , where $V:=\{(p,E^{-}):\ E^{-}\leq-\tilde{H}(p)\}$ .

Proof.

In this proof, we only consider the single-time HJ equation. For the single-time case, $H$ is used to denote the Hamiltonian, instead of $H_{1}$ , for simplicity. First, consider the domain of $S^{*}$ . For each $p\in\mathbb{R}^{n}$ , define

[TABLE]

For the illustration of this definition, see fig. 8a. The function $\tilde{H}$ defined here is an extended-valued function taking values in $\bar{\mathbb{R}}$ . In the last step of this proof, we will show the convexity and specify the range of this function. From this definition, it is obvious that $\mathrm{dom}~{}S^{*}\subseteq V$ , where $V=\{(p,E^{-}):\ E^{-}\leq-\tilde{H}(p)\}$ , as defined in the statement of this lemma. Moreover, denote $V_{1}=\{(p,E^{-}):\ E^{-}<-\tilde{H}(p)\}$ , then we prove $V_{1}\subseteq\mathrm{dom}~{}S^{*}$ by using the monotonicity of $S^{*}(p,\cdot)$ . To be specific, let $p\in\mathbb{R}^{n}$ and $-\infty<\tilde{E}^{-}\leq E^{-}<+\infty$ , then, we have

[TABLE]

Hence, $S^{*}(p,E^{-})$ is non-decreasing with respect to $E^{-}$ . As a result, $(p,E^{-})\in\mathrm{dom}~{}S^{*}$ implies $\{p\}\times(-\infty,E^{-}]\subseteq\mathrm{dom}~{}S^{*}$ . Therefore we obtain $V_{1}\subseteq\mathrm{dom}~{}S^{*}\subseteq V$ .

In the next step, we prove $\mathrm{dom}~{}S^{*}=V$ .

Denote $U:=\{p\in\mathbb{R}^{n}:\ \tilde{H}(p)<+\infty\}$ (see fig. 8a). Here and after in this section, we use the bold character $\mathbf{0}$ to denote the zero vector in $\mathbb{R}^{n}$ . Since $U$ is the projection of $\mathrm{dom}~{}S^{*}$ along the direction $(\mathbf{0},1)$ , $U$ is a convex set. Let $p\in\mathrm{ri\ }U$ . Take $E^{-}<-\tilde{H}(p)$ , then $\partial S^{*}(p,E^{-})\neq\emptyset$ because $(p,E^{-})\in\mathrm{ri\ }\mathrm{dom}~{}S^{*}$ . Let $(x,t)\in\partial S^{*}(p,E^{-})$ , which implies $(p,E^{-})\in\partial S(x,t)$ . If $t>0$ , then $E^{-}=\frac{\partial S}{\partial t}(x,t)$ and $p=\nabla_{x}S(x,t)$ . Since $S$ satisfies the HJ equation eq. 13, $E^{-}+H(p)=0$ . In other words, if $(x,t)\in\partial S^{*}(p,E^{-})$ with $E^{-}\neq-H(p)$ , then we can conclude that $t=0$ . Therefore, for any $E^{-}<-\tilde{H}(p)$ and $E^{-}\neq-H(p)$ , by 2.6, the directional derivative of $S^{*}$ in the direction $(\mathbf{0},1)$ is:

[TABLE]

As a result, $S^{*}(p,\cdot)$ is a constant function in its domain. Denote this value as $f(p)$ . By the continuity of $S^{*}$ when restricting to the straight line $\{p\}\times\mathbb{R}$ , the value $S^{*}(p,-\tilde{H}(p))$ is also $f(p)$ if $\tilde{H}(p)$ is finite. Hence, $S^{*}(p,E^{-})=f(p)$ for any $p\in\mathrm{ri\ }U$ and $E^{-}\leq-\tilde{H}(p)$ .

Now, we consider the case when $p\in U\setminus\mathrm{ri\ }U$ . For the illustration, see fig. 8b. Let $E^{-}<-\tilde{H}(p)$ . Take $q\in\mathrm{ri\ }U$ and $\tilde{E}^{-}<-\tilde{H}(q)$ , then by 2.2,

[TABLE]

Hence, the value of $S^{*}(p,E^{-})$ does not depend on $E^{-}$ if $E^{-}<-\tilde{H}(p)$ . Denote this value as $f(p)$ . By continuity, $S^{*}(p,-\tilde{H}(p))=f(p)$ if $\tilde{H}(p)$ is finite. Therefore, we have proved that the domain of $S^{*}$ coincides with the set $V$ and $S^{*}(p,E^{-})=f(p)$ in the domain of $S^{*}$ .

Then, we prove $f=J^{*}$ when restricting to $\mathrm{dom}~{}f$ . By setting $f(p)=+\infty$ if $p\not\in U$ , we can regard $f$ as a function from $\mathbb{R}^{n}$ to $\mathbb{R}\cup\{+\infty\}$ . It is not hard to check the convexity of $f$ . To be specific, for any $p_{1},p_{2}\in\mathrm{dom}~{}f$ and $\alpha\in(0,1)$ , choose $E^{-}<-\tilde{H}(p_{1})$ and $\tilde{E}^{-}<-\tilde{H}(p_{2})$ (see fig. 8c), then we have

[TABLE]

Hence $f$ is a convex function taking values in $\mathbb{R}\cup\{+\infty\}$ . Also, for each $x\in\mathbb{R}^{n}$ , we have

[TABLE]

Therefore, $f^{**}=J^{*}$ , which implies $\mathrm{ri\ }\mathrm{dom}~{}f=\mathrm{ri\ }\mathrm{dom}~{}J^{*}$ and $f(p)=J^{*}(p)$ if $p\in\mathrm{ri\ }\mathrm{dom}~{}f$ . Moreover, according to 2.2 and eq. 39, we deduce that

[TABLE]

for any $p\in\mathrm{dom}~{}f\setminus\mathrm{ri\ }\mathrm{dom}~{}f$ and $q\in\mathrm{ri\ }\mathrm{dom}~{}f$ . As a result we have $f=J^{*}$ in the domain of definition. In conclusion, we get the following formula for $S^{*}$

[TABLE]

The final part is to prove that $\tilde{H}$ is a convex function taking values in $\mathbb{R}\cup\{+\infty\}$ .

First, we prove that $\tilde{H}$ cannot take the value $-\infty$ by contradiction. Suppose there exists $p\in\mathbb{R}^{n}$ such that $\tilde{H}(p)$ equals $-\infty$ . Then, by definition of $\tilde{H}$ we have $\{p\}\times\mathbb{R}\subseteq\mathrm{dom}~{}S^{*}$ . Together with the formula of $S^{*}$ in eq. 40, we derive

[TABLE]

Therefore, $(\mathbf{0},1,0)$ and $(\mathbf{0},-1,0)$ are in the asymptotic cone of $\mathrm{epi~{}}S^{*}$ by definition eq. 8. Then, by 2.1, for any $q\in U$ , we obtain

[TABLE]

which implies $\{q\}\times\mathbb{R}\subseteq\mathrm{dom}~{}S^{*}$ . Since $q$ is an arbitrary vector in $U$ , we deduce that $\mathrm{dom}~{}S^{*}=U\times\mathbb{R}$ . Moreover, according to eq. 40, the function $S^{*}$ is a constant on the line $\{q\}\times\mathbb{R}$ for any $q\in U$ , which implies that the directional derivative of $S^{*}$ in the direction $(\mathbf{0},1)$ is zero. In other words, we have

[TABLE]

On the other hand, consider any $y\in\mathbb{R}^{n}$ and $s>0$ such that $\partial S(y,s)$ is nonempty. Let $(p,E^{-})\in\partial S(y,s)$ . This implies $(y,s)\in\partial S^{*}(p,E^{-})$ . Hence, according to 2.6, we get

[TABLE]

which contradicts eq. 41. Therefore, $\tilde{H}$ cannot take the value $-\infty$ .

At last, the convexity of $\tilde{H}$ follows from the convexity of $\mathrm{dom}~{}S^{*}$ . In fact, $\mathrm{epi~{}}\tilde{H}=\{(p,-E^{-}):\ (p,E^{-})\in\mathrm{dom}~{}S^{*}\}$ , which is a reflection of the convex set $\mathrm{dom}~{}S^{*}$ , hence it is also convex. Therefore, $\tilde{H}$ is a convex function from $\mathbb{R}^{n}$ to $\mathbb{R}\cup\{+\infty\}$ . ∎

Based on this lemma, the following proposition states the uniqueness result. It can be easily seen in the above lemma that the Legendre transform of $S$ has a similar form as $S_{H}^{*}$ . Actually, the following proposition is proved by equating the two functions $S^{*}$ and $S_{H}^{*}$ .

Proposition 4.1.

The solution to the multi-time Hamilton-Jacobi equation is unique. Specifically, under the assumptions (H1) and (H2), if $S$ satisfies (S1)-(S2), then $S=S_{H}$ .

Proof.

In the proof of this proposition, we first consider the case of single-time. Let $N=1$ , and $H$ be the Hamiltonian.

From 4.1, it is proved that $S^{*}(p,E^{-})=J^{*}(p)+I_{V}(p,E^{-})$ , where $V=\{(p,E^{-}):\ E^{-}\leq-\tilde{H}(p)\}$ and $\tilde{H}$ is a convex function whose domain is the projection of $\mathrm{dom}~{}S^{*}$ along $(\mathbf{0},1)$ . Moreover, $\mathrm{ri\ }\mathrm{dom}~{}J^{*}=\mathrm{ri\ }\mathrm{dom}~{}\tilde{H}$ (note that the domains of $\tilde{H}$ and $f$ are the same).

First, we prove that $\tilde{H}(p)=H(p)$ for any $p\in\mathrm{ri\ }\mathrm{dom}~{}\tilde{H}$ by contradiction. Assume there exists $p\in\mathrm{ri\ }\mathrm{dom}~{}\tilde{H}$ such that $\tilde{H}(p)\neq H(p)$ . Let $E^{-}=-\tilde{H}(p)$ . Then, by 2.3 and eq. 10, we deduce that

[TABLE]

where the last equality holds because $V$ is the reflection of $\mathrm{epi~{}}\tilde{H}$ . Here $N_{V}(p,E^{-})$ denotes the normal cone of the set $V$ at $(p,E^{-})$ . Let $x_{0}\in\partial J^{*}(p)$ , $t>0$ and $v\in\partial\tilde{H}(p)$ . Denote $x=x_{0}+tv$ . Then, by eq. 42 we have $(x,t)\in\partial S^{*}(p,E^{-})$ , which implies $(p,E^{-})\in\partial S(x,t)$ . However, $E^{-}+H(p)=-\tilde{H}(p)+H(p)\neq 0$ , hence the HJ equation eq. 13 does not hold at $(x,t)$ , which is a contradiction. Therefore, $\tilde{H}=H$ when restricting to the relative interior of the domain of $\tilde{H}$ , which implies

[TABLE]

for any $p\in\mathrm{ri\ }\mathrm{dom}~{}\tilde{H}$ .

Actually, the values of any convex lower semi-continuous function on the relative boundary of its domain is fully determined by the values in the relative interior. It is not hard to check that

[TABLE]

Hence, we have proved that $S^{*}$ and $S_{H}^{*}$ agree in the relative interior of the domain. Therefore, $S^{*}=S_{H}^{*}$ in the whole domain, which implies $S=S_{H}$ and gives the uniqueness of the convex solution to the single-time HJ equation.

Then, we can consider the case of multi-time. Now, we assume $N>1$ . It suffices to prove $S$ and $S_{H}$ coincide for any $x\in\mathbb{R}^{n}$ and any $t_{1},\cdots,t_{N}>0$ . Let $\alpha_{1},\cdots,\alpha_{N}$ be arbitrary positive real numbers and denote $\alpha:=(\alpha_{1},\cdots,\alpha_{N})$ . Define $T(x,s):=S(x,s\alpha_{1},\cdots,s\alpha_{N})$ for any $x\in\mathbb{R}^{n}$ and $s\geq 0$ . Then $T\in\Gamma_{0}(\mathbb{R}^{n+1})$ . We can compute the gradient of $T$ with respect to $s$ for any $x\in\mathbb{R}^{n}$ and $s>0$ using chain rule and the assumption that $S$ satisfies the multi-time HJ equation eq. 13 to obtain

[TABLE]

It is easy to check that $T$ satisfies the initial condition given by $J$ , i.e. $T(x,0)=J(x)$ for any $x\in\mathbb{R}^{n}$ . Hence, $T$ is a solution to the single-time HJ equation with Hamiltonian $H=\sum_{j=1}^{N}\alpha_{j}H_{j}$ , which is finite-valued, 1-coercive and strictly convex. Therefore, for the single-time HJ equation, the conditions (H1)-(H2) and (S1)-(S2) are satisfied. Then, the solution $T$ is unique and equal to the Hopf formula with respect to the Hamiltonian $H$ . Hence, for any $x\in\mathbb{R}^{n},s>0$ and any $\alpha_{1},\cdots,\alpha_{N}>0$ , we have

[TABLE]

Therefore, $S=S_{H}$ in the relative interior of the domain, which implies $S=S_{H}$ in the whole space, because of the lower semi-continuity of $S$ and $S_{H}$ . The uniqueness of the solution to the multi-time HJ equation follows. ∎

One can actually apply the above arguments to weaker assumptions and obtain a generalized result, which is stated in the following corollary. In this generalized result, it is possible that the solution $S$ is not a classical solution, hence the subgradients of $S$ , instead of the gradients, are assumed to satisfy the HJ equation, which is a natural generalization of the classical solution when we want to consider the solution which is convex and lower semi-continuous.

Corollary 4.1.

Let $J\in\Gamma_{0}(\mathbb{R}^{n})$ , and $H_{1},H_{2},\cdots,H_{N}$ be arbitrary extended-valued functions defined on $\mathbb{R}^{n}$ . Assume there exists a function $S\in\Gamma_{0}(\mathbb{R}^{n}\times[0,+\infty)^{N})$ satisfying:

(i)

If $p\in\mathbb{R}^{n}$ and $E_{1}^{-},\cdots,E_{N}^{-}\in\mathbb{R}$ satisfy $(p,E_{1}^{-},\cdots,E_{N}^{-})\in\partial S(x,t_{1},\cdots,t_{N})$ for some $x\in\mathbb{R}^{n}$ and $t_{1},\cdots,t_{N}>0$ , then $E_{j}^{-}+H_{j}(p)=0$ for any $j=1,\cdots,N$ .

(ii)

$S(x,0,\cdots,0)=J(x)$ * for any $x\in\mathbb{R}^{n}$ .*

Then, the following statements hold:

For the case of single time, i.e. $N=1$ , denote $H=H_{1}$ to be the Hamiltonian. If there exists $x\in\mathbb{R}^{n}$ , $t>0$ such that $S(x,t)\neq+\infty$ , then $S$ is unique and $S=F^{*}$ , where $F$ is defined by

[TABLE]

for any $p\in\mathbb{R}^{n}$ and $E^{-}\in\mathbb{R}$ . Moreover, the restriction of $H$ on $\mathrm{ri\ }\mathrm{dom}~{}J^{*}$ is finite-valued and convex.

2.

For the multi-time case, i.e. $N>1$ , if $\tilde{S}$ is another function satisfying the assumptions (i)-(ii) with $\mathrm{ri\ }\mathrm{dom}~{}\tilde{S}=\mathrm{ri\ }\mathrm{dom}~{}S$ , then $\tilde{S}=S$ . In other words, the solution is unique when the relative interior of the domain is given.

Proof.

The proof of this corollary is similar to the proof of 4.1, so we just give a brief sketch here. First, we adjust the proof of 4.1 by changing the gradients of $S$ to the subgradients of $S$ . The argument still holds because we assume in (i) that the subgradients of $S$ satisfy the HJ equation. Then, we draw the same conclusion as in 4.1. In other words, with the function $\tilde{H}$ defined in eq. 37, we have

[TABLE]

Also, the part of $N=1$ in the proof of 4.1 still holds. So we derive that the two functions $\tilde{H}$ and $H$ coincide in the relative interior of $\mathrm{dom}~{}J^{*}$ . Together with eq. 44, we derive eq. 43, and hence the first statement in this corollary follows.

For the case when $N>1$ , it suffices to prove that $S$ and $\tilde{S}$ coincide in the relative interior of the domain. Let $(y,t_{1},\cdots,t_{N})$ be an arbitrary point in $\mathrm{ri\ }\mathrm{dom}~{}S$ . It remains to prove that $S$ and $\tilde{S}$ are equal at the point $(y,t_{1},\cdots,t_{N})$ . Notice that we have $t_{i}>0$ for any $i=1,\cdots,N$ , then we can choose the positive number $\alpha_{i}$ in the proof of 4.1 to be $t_{i}$ for any $i$ . As in the proof of 4.1, we define the functions $T$ and $\tilde{T}$ by

[TABLE]

for any $x\in\mathbb{R}^{n}$ and $s\geq 0$ . Since there exists a point $(y,t_{1},\cdots,t_{N})$ in the relative interior of $\mathrm{dom}~{}S$ , one can easily check that the assumptions in [66, Thm.XI.3.2.1] hold. Then, by [66, Thm.XI.3.2.1], the chain rule for the subgradients of $S$ holds. Similarly, the chain rule also holds for the subgradients of $\tilde{S}$ . Therefore, the argument in the proof of 4.1 in the multi-time case remains valid by changing the gradients to the subgradients. As a result, we conclude that both $T$ and $\tilde{T}$ solve the single-time HJ equation with the Hamiltonian $\sum_{j=1}^{N}\alpha_{j}H_{j}$ . Then, by the first statement in this corollary, we have $T\equiv\tilde{T}$ , which implies that $S$ and $\tilde{S}$ coincide at the point $(y,t_{1},\cdots,t_{N})$ , and the proof is complete. ∎

5. A Regularization Method for the Degenerate Cases

In the previous two sections, we discussed the relation between some optimization problems and the multi-time HJ equations under the assumptions (H1) and (H2). In general, if those assumptions are not satisfied, some results may collapse. For example, if there is no strictly convex Hamiltonian, then the solution may be non-differentiable, which leads to the non-uniqueness of the maximizer $p$ (called momentum) in the Hopf formula eq. 14. Also, the minimizer $u$ in the Lax formula eq. 15 may be non-unique if the Hamiltonians are not differentiable. However, these are two common situations for optimization problems such as the decomposition models. In fact, any norm or indicator function is neither strictly convex nor differentiable. As a result, it is an important problem to select a meaningful momentum $p$ or minimizer $u$ in the solution set when it contains more than one element.

In this section, we propose a regularization method to select a unique momentum $p$ and a unique minimizer $u$ simultaneously, and provide the representation formulas for both selected quantities by using the results stated in the previous sections. Intuitively, to select a minimizer $u$ , we modify the degenerate term by adding $\lambda H$ to it where $\lambda$ is a positive parameter and $H$ is a differentiable function satisfying (H1). When $\lambda$ approaches zero, the minimizer of the modified problem will converge to the unique minimizer $\bar{u}$ in the solution set of the original problem which minimizes the function $H$ . The procedure to select $p$ is the same except performing the inf-convolution with $\lambda H^{*}(\cdot/\lambda)$ to the degenerate term instead of the addition of $\lambda H$ .

In the literature, the special case selecting the momentum $p$ using inf-convolution with $\|\cdot\|^{2}/(2\lambda)$ is well-known as Moreau-Yosida approximation, which is introduced, for instance, in [8, Thm.2, p.144] and [27, Thm.3.1, p.54]. Generally, a Moreau-Yosida based regularization method usually selects a unique minimizer $u$ only or a momentum $p$ only, but not both. Our contribution here is that we consider the primal problem and the dual problem simultaneously. In other words, one can select the momentum $p$ and the minimizer $u$ at the same time using our method. This analysis can be adapted easily to other decomposition models with more degenerate terms. Moreover, one can also use the same procedure with other function $H$ or even use two different functions in the two added terms. One alternative choice is $\|\cdot\|_{\alpha}^{\alpha}/\alpha$ for any $\alpha>1$ , for example. In fact, if $H$ is chosen to be any non-negative, finite-valued, 1-coercive, differentiable and strictly convex function, the statements in this section still hold. To be specific, the proofs of 5.1, 5.3 and 5.1 hold after subtle adjustment, and one can use subdifferential calculus to prove 5.2. In this paper, for simplicity, we mainly focus on the quadratic regularization terms, which are usually preferred in practice because of the simplicity and efficiency of numerical implementation.

Now, we focus on a specific decomposition model, and the regularization function $H$ is chosen to be $\|\cdot\|_{2}^{2}/2$ . Some other models can be analyzed using similar arguments. Let $\|\cdot\|$ and ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}$ be two arbitrary norms whose dual norms are denoted as $\|\cdot\|_{*}$ and ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}$ . In fact, all the results remain valid if $\|\cdot\|$ and ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}$ are two semi-norms, in which case the corresponding dual norms $\|\cdot\|_{*}$ and ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}$ are finite in some subspaces and equal to $+\infty$ otherwise. The set of minimizers is defined as follows

[TABLE]

We can regard the minimal value as a solution to the HJ equation given by the Lax formula with spatial variable $x\in\mathbb{R}^{n}$ and time variable $t>0$ and define

[TABLE]

Note that in the corresponding HJ equation, the initial function is $\|\cdot\|$ and the Hamiltonian is ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}$ , hence the assumption (H1) is not satisfied. As a result, we need to apply the regularization method in this example. For simplicity we also use $F_{1}$ , $F_{2}$ to denote these two norms, then $F_{2}^{*}(y)=I\{{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|y\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}\leq{\color[rgb]{0,0,0}t}\}$ . We assume $t=1$ and drop the variable $t$ in the remainder of this section because the variation of $t$ is not considered in this problem. Then, we can rewrite the problem as the following

[TABLE]

In fact, there are in practice some useful models in the literature which can fit in this form. Now, we give two examples. In what follows, we use $\|\cdot\|_{TV}$ , $\|\cdot\|_{E}$ and $\|\cdot\|_{G}$ to denote the discrete total variation semi-norm, the discrete $E-$ norm and the discrete $G-$ norm, respectively. First, in [9, 10], it is shown that the Meyer’s model in the following form

[TABLE]

is equivalent to

[TABLE]

for some suitable positive parameter $\beta$ . In this example, both $F_{1}$ and $F_{2}$ are the discrete total variation because the discrete $G-$ norm is the dual norm of $\|\cdot\|_{TV}$ . Similarly, another Meyer’s model stated as follows

[TABLE]

is equivalent to

[TABLE]

for some suitable positive parameter $\beta$ [11]. In this example, the functions $F_{1}$ and $F_{2}$ are the discrete total variation and the dual norm of the discrete $E-$ norm, respectively.

As mentioned above, we apply two operators to the function $F_{1}$ and obtain its approximation

[TABLE]

where $\lambda,\mu>0$ are small regularization parameters. Here, we choose to modify the function $F_{1}$ , but one may instead apply the operators to the function $F_{2}$ and the analysis is similar. Then, the problem reads

[TABLE]

We expand the inf-convolution to get

[TABLE]

Here and later in this section, we omit the variable $x$ when there is no ambiguity.

By introducing the quadratic terms, the uniqueness of $(v_{\lambda,\mu},w_{\lambda,\mu})$ and the differentiability of $S_{\lambda,\mu}$ are guaranteed. When the parameters $\lambda$ and $\mu$ converge to zero in a comparable rate, the reasonable minimizer $u$ and momentum $p$ are selected. In fact, they are the elements with the minimal $l^{2}$ norms in the target sets $U(x)$ and $\partial S(x)$ . The detailed statements are listed as follows.

Lemma 5.1.

*For any $\lambda,\mu>0$ , there is a unique minimizer $(v_{\lambda,\mu},w_{\lambda,\mu})$ to the problem eq. 48. Moreover, for any positive constant $K$ , the sets $\{v_{\lambda,\mu}:\lambda,\mu\in(0,K)\}$ and $\{w_{\lambda,\mu}:\lambda,\mu\in(0,K)\}$ are bounded. *

Proof.

It is easy to check that the objective function in eq. 48 is 1-coercive and strictly convex, because of the 1-coercivity and strict convexity of the quadratic terms. Therefore, there exists a unique minimizer $(v_{\lambda,\mu},w_{\lambda,\mu})$ .

Setting $w=x-v$ and $v\in U(x)$ in eq. 48 and comparing it with eq. 45, we obtain

[TABLE]

Denote $C:=S(x)+\min_{v\in U(x)}\frac{{\color[rgb]{0,0,0}K}}{2}\|v\|_{2}^{2}$ , where $K$ is an arbitrary positive number as defined in the statement. Then $C$ is independent of $\lambda$ and $\mu$ , and $S_{\lambda,\mu}(x)\leq C$ when $0<\lambda<K$ . From this inequality and the definition of $S_{\lambda,\mu}(x)$ in eq. 48, we can derive a bound for $x-v_{\lambda,\mu}-w_{\lambda,\mu}$ that reads

[TABLE]

Therefore, $v_{\lambda,\mu}+w_{\lambda,\mu}$ is bounded by the constant $\|x\|_{2}+\sqrt{2CK}$ when we assume $\lambda,\mu\in(0,K)$ .

Then, from the constraint given by the indicator function $F_{2}^{*}$ in the minimization problem eq. 48, we have ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|w_{\lambda,\mu}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}\leq 1$ , which implies the boundedness of $w_{\lambda,\mu}$ because all the norms are equivalent in the finite-dimensional space $\mathbb{R}^{n}$ . As a result, $v_{\lambda,\mu}$ is also bounded whenever $\lambda,\mu\in(0,K)$ . Then the conclusion follows. ∎

Lemma 5.2.

Let $v_{\lambda,\mu}$ and $w_{\lambda,\mu}$ be defined by eq. 48. Then, we have $\lim_{\lambda,\mu\to 0^{+}}v_{\lambda,\mu}+w_{\lambda,\mu}=x.$ Any cluster point of $v_{\lambda,\mu}$ is also a cluster point of $u_{\lambda,\mu}$ and vice versa. Moreover, any cluster point of $u_{\lambda,\mu}$ and $v_{\lambda,\mu}$ is in $U(x)$ .

Proof.

The convergence of $v_{\lambda,\mu}+w_{\lambda,\mu}$ to $x$ follows from eq. 49. Since $u_{\lambda,\mu}=x-w_{\lambda,\mu}$ , any cluster point of $u_{\lambda,\mu}$ is also a cluster point of $v_{\lambda,\mu}$ and vice versa. It remains to show that any cluster point of $v_{\lambda,\mu}$ is in $U(x)$ .

By the definition of $(v_{\lambda,\mu},w_{\lambda,\mu})$ , we have

[TABLE]

where we first multiply the objective function by $\mu$ and then expand the quadratic term. Recall that any indicator function is invariant under multiplication with a positive constant, hence we obtain $I\{{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|w\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}\leq 1\}=\mu I\{{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|w\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}\leq 1\}$ and the second equality in eq. 50 follows. The last maximization problem in eq. 50 is in the form of Hopf formula. The corresponding multi-time HJ equation with time variables $\mu$ and $\nu=\lambda\mu$ is given by

[TABLE]

Here, $J$ is the l.s.c. convex function such that $J^{*}(v,w)=\frac{1}{2}\|v+w\|_{2}^{2}+F_{2}^{*}(w)$ . Although the assumption (H1) is not satisfied, by eq. 50 and 5.1, we know that the Hopf formula is well-defined in $\mathbb{R}^{n}\times\mathbb{R}^{n}\times[0,+\infty)\times[0,+\infty)$ . Moreover, the solution $\tilde{S}$ is the classical solution to the multi-time HJ equation eq. 51 and its spatial gradient equals $(v_{\lambda,\mu},w_{\lambda,\mu})$ . To be specific, we have

[TABLE]

Then, we want to apply the results in 3.4 (i) to prove that any cluster point of $v_{\lambda,\mu}$ is in $U(x)$ . In fact, under the basic assumptions that $H_{j},J\in\Gamma_{0}(\mathbb{R}^{n})$ and the Hopf formula is well-defined, the proof of 3.4 (i) only requires the following statements:

(a)

$\partial J(x,x)$ is non-empty;

(b)

the Hamiltonians are finite-valued;

(c)

$\tilde{S}$ is differentiable;

(d)

the spatial gradient $\nabla_{y,z}\tilde{S}(x,x,\mu,\lambda\mu)$ is bounded with all limit points in $\partial J(x,x)$ .

The statements (b) and (c) are obvious satisfied. It is straightforward to check $\partial J(x,x)\neq\emptyset$ . Specifically, $(v,w)\in\partial J(x,x)$ iff $(x,x)\in\partial J^{*}(v,w)$ . By simple computation, $\partial J^{*}(v,w)=(v+w,v+w+\partial F_{2}^{*}(w))$ . Then we obtain

[TABLE]

Such $v$ and $w$ always exist, hence $\partial J(x,x)\neq\emptyset$ . As for the statement (d), the boundedness of $\nabla_{y,z}\tilde{S}(x,x,\mu,\lambda\mu)$ follows from eq. 52 and 5.1. By eq. 49, $v_{\lambda,\mu}+w_{\lambda,\mu}$ converges to $x$ . Also, ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|w_{\lambda,\mu}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}\leq 1$ is given by the constraint imposed by $F_{2}^{*}$ in the minimization problem eq. 48. Together with eq. 52, we can conclude that any limit point of $\nabla_{y,z}\tilde{S}(x,x,\mu,\lambda\mu)$ , denoted as $(v,w)$ , satisfies $v+w=x$ and ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|w\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}\leq 1$ . Hence, $(v,w)\in\partial J(x,x)$ by eq. 53 and the statement (d) is proved.

Therefore, the conclusion of 3.4 (i) still holds although the assumption (H1) is not satisfied. As a result, for any cluster point $(\bar{v},\bar{w})$ of $(v_{\lambda,\mu},w_{\lambda,\mu})$ ,

[TABLE]

where the last two equalities follow from eq. 53 and the definition of $U(x)$ in eq. 45. In conclusion, any cluster point $\bar{v}$ of $v_{\lambda,\mu}$ is in $U(x)$ . ∎

Lemma 5.3.

For any $\lambda,\mu>0$ , the function $S_{\lambda,\mu}$ defined in eq. 47 is differentiable. Let $x\in\mathbb{R}^{n}$ and define $p_{\lambda,\mu}:=\nabla S_{\lambda,\mu}(x)$ . Then for any positive constant $K$ , the set of gradients $\{p_{\lambda,\mu}:\lambda,\mu\in(0,K)\}$ is bounded. Moreover, as $\lambda$ and $\mu$ approach zero, any cluster point of $p_{\lambda,\mu}$ is in $\partial S(x)$ .

Proof.

Rewriting the formula of $S_{\lambda,\mu}$ in eq. 48, we get

[TABLE]

From straightforward computation, by 2.7 and the definition of $(v_{\lambda,\mu},w_{\lambda,\mu})$ in eq. 48, we obtain

[TABLE]

As a result, $\partial S_{\lambda,\mu}(x)$ contains at most one element. On the other hand, $S_{\lambda,\mu}$ is convex and finite-valued, which implies the subdifferential of $S_{\lambda,\mu}$ is non-empty. Hence, $S_{\lambda,\mu}$ is differentiable and its gradient is given by

[TABLE]

Let $K$ be an arbitrary positive number. Now, we prove that there exists a constant $C$ such that $\|p_{\lambda,\mu}\|_{2}\leq C$ whenever $\lambda,\mu\in(0,K)$ . By eqs. 54 and 55, $p_{\lambda,\mu}$ is in the set $\partial F_{1}(v_{\lambda,\mu})+\lambda v_{\lambda,\mu}$ . On the one hand, the subdifferential of the norm $F_{1}$ is always bounded. In other words, there exists a constant $C_{1}$ such that $\|s\|_{2}\leq C_{1}$ whenever $s\in\partial F_{1}(z)$ for some $z\in\mathbb{R}^{n}$ . Then, we deduce that the set $\partial F_{1}(v_{\lambda,\mu})$ is bounded by $C_{1}$ . On the other hand, according to 5.1, there exists a constant $C_{2}$ such that $\|v_{\lambda,\mu}\|_{2}\leq C_{2}$ whenever $\lambda,\mu\in(0,K)$ . Therefore, $\{p_{\lambda,\mu}:\lambda,\mu\in(0,K)\}$ is bounded by $C_{1}+C_{2}K$ .

Let $p$ be a cluster point of $\{p_{\lambda,\mu}\}$ . By taking a subsequence we can assume $\lambda_{k}$ and $\mu_{k}$ converge to zero and $p_{k}:=p_{\lambda_{k},\mu_{k}}$ converges to $p$ . By 5.1, $v_{k}:=v_{\lambda_{k},\mu_{k}}$ is bounded, hence we can assume $v_{k}$ converges to a point $u$ by taking a subsequence. Then, $w_{k}:=w_{\lambda_{k},\mu_{k}}$ converges to $x-u$ by 5.2. From eq. 54, we have

[TABLE]

Since the subdifferential operators $\partial F_{1}$ and $\partial F_{2}^{*}$ are continuous [66, Prop.XI.4.1.1], when $k$ goes to infinity, the above inclusion becomes

[TABLE]

On the other hand, by 2.7 and the definition of $S(x)$ and $U(x)$ in eq. 45, we have

[TABLE]

for any $\tilde{u}\in U(x)$ . Moreover, by 5.2, since $u$ is a cluster point of $v_{k}$ , we can conclude that $u\in U(x)$ . As a result, we can choose $\tilde{u}=u$ in eq. 57 and compare it with eq. 56 to conclude that $p\in\partial S(x)$ . ∎

Proposition 5.1.

Assume $\{\lambda_{k}\}\subset(0,+\infty)$ and $\{\mu_{k}\}\subset(0,+\infty)$ converge to zero and $\lim_{k\to+\infty}\frac{\lambda_{k}}{\mu_{k}}=c\in(0,+\infty)$ . Then, the minimizer $u_{k}:=u_{\lambda_{k},\mu_{k}}$ and the gradient $p_{k}:=\nabla S_{\lambda_{k},\mu_{k}}(x)$ converge to the $l^{2}$ projections of zero onto the sets $U(x)$ and $\partial S(x)$ , respectively. To be specific,

[TABLE]

Proof.

Define $H(\cdot):=\|\cdot\|_{2}^{2}/2$ . We will use the general symbol $H$ to replace the quadratic function because this proof holds for a general finite-valued, 1-coercive, differentiable and strictly convex function $H$ .

Note that the limit of $u_{k}$ is the same as the limit of $v_{k}$ , hence we just need to prove the result for $v_{k}$ and $p_{k}$ . Denote

[TABLE]

Since $v_{k}$ and $p_{k}$ are bounded, we can assume that $v_{k}$ converges to $u$ and $p_{k}$ converges to $p$ by taking a subsequence. Then it suffices to prove $u=\bar{u}$ , $p=\bar{p}$ .

By eq. 54 and eq. 55, we have

[TABLE]

By 2.5, we deduce that $w_{k}\in\partial F_{2}(p_{k})$ and $x-v_{k}-w_{k}=\mu_{k}\nabla H(p_{k})$ . Together with eq. 59, we obtain

[TABLE]

On the other hand, since $\bar{u}$ and $\bar{p}$ are the minimizer and momentum of the original problem eq. 45, we have

[TABLE]

Combining eq. 60 and eq. 61, we obtain

[TABLE]

Since the subdifferential operators $\partial F_{1}$ and $\partial F_{2}$ are monotone, by eq. 9, we obtain

[TABLE]

We sum up the two inequalities to get

[TABLE]

We divide the above inequality by $\mu_{k}$ and take the limit $k\to+\infty$ to obtain

[TABLE]

where the positive constant $c$ is defined in the statement of this proposition to be $c:=\lim_{k\to+\infty}\lambda_{k}/\mu_{k}$ . From 5.2 and 5.3, we know that $u\in U(x)$ and $p\in\partial S(x)$ , hence we have $H(u)\geq H(\bar{u})$ and $H(p)\geq H(\bar{p})$ by eq. 58. Taken together with eq. 62, we obtain

[TABLE]

As a result, the inequalities in eq. 63 become equalities, which implies $H(u)=H(\bar{u})$ and $H(p)=H(\bar{p})$ because $c$ is positive by assumption. Therefore, we conclude that $u=\bar{u}$ and $p=\bar{p}$ , since the minimizers in eq. 62 are unique. ∎

In practice, if a model has non-unique minimizers, then some existing optimization algorithms may fail to converge, in which case one may consider this modification procedure and perform the optimization algorithm to the modified problem to obtain a sequence converging to the selected minimizer. Here, for simplicity, we only demonstrate the method on a specific optimization problem whose objective function contains two parts including one norm and one constraint. In fact, this method works for more general cases, such as some other decomposition models with more degenerate parts. Now, we give a numerical illustration for this proposed regularization method on the celebrated TVL1 model [5, 6, 14, 42, 43, 52, 74, 75].

To be specific, the TVL1 model solves the following optimization problem

[TABLE]

where $\|\cdot\|_{TV}$ denotes the discrete total variation semi-norm defined in eq. 6. However, it is well-known that this minimization problem may have non-unique minimizers [42, 50]. For instance, let $\Omega$ be the domain of an image and $\Omega_{1}$ be any small rectangle in $\Omega$ such that $2|\Omega_{1}|<|\Omega|$ . Let $I$ be the set of indices whose corresponding pixels are in $\Omega_{1}$ . Let $m_{1},m_{2}$ be the numbers of pixels on the two adjacent sides of the small rectangle $\Omega_{1}$ . In other words, there are $m_{1}m_{2}$ pixels in $\Omega_{1}$ and $2(m_{1}+m_{2})$ pixels on the boundary of $\Omega_{1}$ . Let $a$ and $b$ be two different real numbers in $[0,1]$ and set the discretized image $x$ as follows

[TABLE]

Then, the minimizers of the TVL1 model eq. 64 with $\alpha=(m_{1}m_{2})/(2m_{1}+2m_{2})$ are not unique. Moreover, we have

[TABLE]

where $u_{1}$ and $u_{2}$ are defined by

[TABLE]

By applying the proposed regularization method, a unique minimizer is selected in this set of minimizers. To be specific, we solve the following problem

[TABLE]

Note that the above model is related to models incorporating infinal convolution of $L^{1}$ and $L^{2}$ fidelity terms, which are used for mixed Gaussian and Salt & Pepper noise image restoration, as proposed in [30, 31] for instance. Although this model is different from the example we give in eq. 45, one can adjust the arguments to prove the same statements for this model. In other words, when the two parameters $\lambda$ and $\mu$ converge to zero in a comparable rate, the $v$ -component $v^{TVL1}_{\lambda,\mu}(x)$ converges to the element $\bar{u}^{TVL1}(x)$ defined by

[TABLE]

and the $w$ -component converges to the residual $x-u_{1}$ . Numerically, we use a splitting method and the algorithm in [39, 50, 67] to solve the minimizer in eq. 65 when $\lambda=\mu=0.01$ . We test the regularization method on the four images shown in the first row in table 2, and the corresponding $v$ -components are shown in the second row.

6. Conclusion

In this paper, we provide connections between multi-time Hamilton-Jacobi equations and some optimization problems such as the decomposition models in image processing. To be specific, we show a representation formula for the minimizers $u_{j}$ and clarify the connection between the minimizers $u_{j}$ and the spatial gradient $p$ of the minimal values. Moreover, we also study the variational behaviors of the momentum $p$ and the velocities $\frac{u_{j}}{t_{j}}$ . It turns out that their limits solve two optimization problems which are dual to each other. In addition, we provide a new perspective from convex analysis to prove the uniqueness of the convex solution to the multi-time Hamilton-Jacobi equation, taking advantage of the convexity assumptions to overcome the difficulty that the functions can take the value $+\infty$ . At last, we demonstrate a regularization method to modify the decomposition models which have non-unique minimizers.

In this work, we consider the optimization problems which can be written in the form of Lax formula eq. 15. Hence, we assume the observed data $x$ is the summation of different components $\{u_{j}\}$ . We do not consider non-additive perturbation models such as [16, 55, 87]. However, our analysis actually covers a wide range of decomposition models with additive noise and the results can be easily extended to vector-valued images such as color images.

Bibliography91

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Acar and C. R. Vogel , Analysis of bounded variation penalty methods for ill-posed problems , Inverse Problems, 10 (1994), pp. 1217–1229.
2[2] W. Allard , Total variation regularization for image denoising, I. geometric theory , SIAM Journal on Mathematical Analysis, 39 (2008), pp. 1150–1190.
3[3] , Total variation regularization for image denoising, II. examples , SIAM Journal on Imaging Sciences, 1 (2008), pp. 400–417.
4[4] , Total variation regularization for image denoising, III. examples. , SIAM Journal on Imaging Sciences, 2 (2009), pp. 532–568.
5[5] S. Alliney , A property of the minimum vectors of a regularizing functional defined by means of the absolute norm , IEEE Transactions on Signal Processing, 45 (1997), pp. 913–917.
6[6] S. Alliney and S. A. Ruzinsky , An algorithm for the minimization of mixed l 1 subscript 𝑙 1 l_{1} and l 2 subscript 𝑙 2 l_{2} norms with application to bayesian estimation , IEEE Transactions on Signal Processing, 42 (1994), pp. 618–627.
7[7] G. Aubert and P. Kornprobst , Mathematical Problems in Image Processing , Springer-Verlag, 2002.
8[8] J. P. Aubin and A. Cellina , Differential Inclusions: Set-Valued Maps and Viability Theory , Springer-Verlag, Berlin, Heidelberg, 1984.

	Example 1	Example 2	Example 3	Example 4
Original Image
$v$ Component

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On Decomposition Models in Imaging Sciences and Multi-time Hamilton-Jacobi Partial Differential Equations

Abstract.

1. Introduction

2. Mathematical Background

Proposition 2.1**.**

Proposition 2.2**.**

Proposition 2.3**.**

Proposition 2.4**.**

Proposition 2.5**.**

Proposition 2.6**.**

Proposition 2.7**.**

3. Properties of the Solutions to the Multi-time Hamilton-Jacobi Equations

Proposition 3.1**.**

Proof.

Proposition 3.2**.**

Proof.

Proposition 3.3**.**

Proof.

Lemma 3.1**.**

Proof.

Proposition 3.4**.**

Remark 3.1**.**

Proof.

4. Uniqueness of the Convex Solutions to the Multi-time Hamilton-Jacobi Equations

Lemma 4.1**.**

Proof.

Proposition 4.1**.**

Proof.

Corollary 4.1**.**

Proof.

5. A Regularization Method for the Degenerate Cases

Lemma 5.1**.**

Proof.

Lemma 5.2**.**

Proof.

Lemma 5.3**.**

Proof.

Proposition 5.1**.**

Proof.

6. Conclusion

Proposition 2.1.

Proposition 2.2.

Proposition 2.3.

Proposition 2.4.

Proposition 2.5.

Proposition 2.6.

Proposition 2.7.

Proposition 3.1.

Proposition 3.2.

Proposition 3.3.

Lemma 3.1.

Proposition 3.4.

Remark 3.1.

Lemma 4.1.

Proposition 4.1.

Corollary 4.1.

Lemma 5.1.

Lemma 5.2.

Lemma 5.3.

Proposition 5.1.