On Decomposition Models in Imaging Sciences and Multi-time Hamilton-Jacobi Partial Differential Equations
J\'er\^ome Darbon, Tingwei Meng

TL;DR
This paper explores the theoretical links between multi-time Hamilton-Jacobi PDEs and variational image decomposition models, revealing how solutions and minimizers relate and proposing methods for models with non-unique solutions.
Contribution
It establishes new theoretical connections between Hamilton-Jacobi PDEs and image decomposition, including uniqueness proofs and regularization techniques for non-unique minimizers.
Findings
Minimal values governed by multi-time Hamilton-Jacobi PDEs
Minimizers represented via Hamilton-Jacobi momentum
Regularization approach for non-unique minimizers
Abstract
This paper provides new theoretical connections between multi-time Hamilton-Jacobi partial differential equations and variational image decomposition models in imaging sciences. We show that the minimal values of these optimization problems are governed by multi-time Hamilton-Jacobi partial differential equations. The minimizers of these optimization problems can be represented using the momentum in the corresponding Hamilton-Jacobi partial differential equation. Moreover, variational behaviors of both the minimizers and the momentum are investigated as the regularization parameters approach zero. In addition, we provide a new perspective from convex analysis to prove the uniqueness of convex solutions to Hamilton-Jacobi equations. Finally we consider image decomposition models that do not have unique minimizers and we propose a regularization approach to perform the analysis using…
| Notation | Meaning | Definition |
| domain of | ||
| relative interior of | the interior of with respect to the minimal hyperplane containing in | |
| normal cone of at | ||
| asymptotic cone of | ||
| epigraph of | ||
| a useful and standard class of convex functions | the set containing all proper, convex, l.s.c. functions from to | |
| directional derivative of at along the direction | ||
| subdifferential of at | ||
| the indicator function of | If , then define . Otherwise, define . | |
| Legendre transform of | ||
| inf-convolution of and |
| Example 1 | Example 2 | Example 3 | Example 4 | |
| Original Image | ||||
| Component |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On Decomposition Models in Imaging Sciences and Multi-time Hamilton-Jacobi Partial Differential Equations
Jérôme Darbon
Department of Applied Mathematics, Brown University, Providence, RI
and
Tingwei Meng
Department of Applied Mathematics, Brown University, Providence, RI
Abstract.
This paper provides new theoretical connections between multi-time Hamilton-Jacobi partial differential equations and variational image decomposition models in imaging sciences. We show that the minimal values of these optimization problems are governed by multi-time Hamilton-Jacobi partial differential equations. The minimizers of these optimization problems can be represented using the momentum in the corresponding Hamilton-Jacobi partial differential equation. Moreover, variational behaviors of both the minimizers and the momentum are investigated as the regularization parameters approach zero. In addition, we provide a new perspective from convex analysis to prove the uniqueness of convex solutions to Hamilton-Jacobi equations. Finally, we consider image decomposition models that do not have unique minimizers and we propose a regularization approach to perform the analysis using multi-time Hamilton-Jacobi partial differential equations.
The authors are listed in alphabetical order. This work was funded by NSF 1820821
1. Introduction
In the late 20th century, the Hamilton-Jacobi (HJ) equation was widely studied in the field of partial differential equations (PDEs). To be specific, the solution defined for , satisfies the following Cauchy problem
[TABLE]
where is the Hamiltonian and is the initial data. When the Hamiltonian only depends on the spatial gradient , under some regularity and convexity assumptions, the solution is given by the Hopf formula or Lax formula [18, 68]
[TABLE]
where and are the Legendre transform of the functions and , respectively. From the physics point of view, HJ PDE describes the movement of a particle in a physics model whose energy function is given by the Hamiltonian . To be specific, the variables and are the current position and time of the particle. The characteristic line of the PDE gives the trajectory of the particle. The momentum is given by the spatial gradient which coincides with the maximizer in the Hopf formula. The velocity is given by where is the minimizer in the Lax formula.
We refer the readers to the review paper [48] for thorough details and [49, 69] for connections between convex analysis and HJ equations. An extension of this PDE is to consider the time variable in a higher dimensional space , in which case the PDE system is called the multi-time Hamilton-Jacobi equation, first discussed by Rochet from an economic point of view [80]. Later, Lions and Rochet [71] considered the multi-time HJ equations when the Hamiltonians are convex functions which only depend on the momentum. They proposed the generalized Hopf formula by writing it as the composition of several semigroups of the corresponding single-time HJ operators. Following their work, several existence and uniqueness results [20, 32, 73, 78, 88] were provided in more general cases, for example, when the Hamiltonians have spatial or time dependence.
It is well known that the HJ equation has a deep relationship with optimal control [26] and differential games [57, 84]. Later, Darbon [49] provided a representation formula for the minimizers of a specific kind of optimization problem, which relates the minimizers to the spatial gradients of the solutions to the HJ equations. As we will see below, many models in imaging sciences can be viewed from a perspective of HJ PDEs. Following that work, we generalize the results to multi-time HJ equations and a larger set of optimization problems, including the decomposition models in image processing.
In the past few decades, many decomposition models have been proposed in image processing. These models are applied to different practical problems, such as inpainting [23, 56], image classification [12], and road detection [62]. Here, we give a brief overview of convex variational models in this area. There are many models that cannot be fully listed here, for which we refer the readers to [44, 61].
The basic idea of image decomposition is to regard an image as a summation of several components , and solve the following minimization problem:
[TABLE]
Here, each function is designed to characterize the corresponding component . One may tune the parameters to put emphasis on different components. There are many celebrated decomposition models in the literature of imaging sciences. In the introduction we mention the continuous versions of the models, while later in the main part of this paper we will work with their discrete versions. The first widely used decomposition model is the Rudin-Osher-Fatemi (ROF) model, proposed in [83], which applies the total variation (TV) semi-norm and to recognize the geometry and noise in an image, respectively. In the continuous setting, for any function and , the TV semi-norm of is defined by
[TABLE]
Here and after in the introduction, the derivatives and divergence are in the distribution sense. The space is the space containing all functions of bounded variation, defined by
[TABLE]
Under these settings, the ROF model solves the following problem
[TABLE]
The mathematical analysis for the ROF model is provided in [1, 2, 3, 4, 7, 28, 29, 33, 34, 35, 36, 38, 40, 41, 46, 47, 51, 63, 64, 76, 79, 89, 91]. Later, Meyer [72] pointed out the disadvantage of in capturing oscillating patterns. In order to overcome this disadvantage, he suggested using the norm in either of the three spaces to replace it, where these three spaces are defined as follows. We use the notations of Meyer to describe these spaces [72]. First, define the space of functions of bounded mean oscillation () by
[TABLE]
and the homogeneous Besov space by
[TABLE]
Let be the dual space of . Then, define by , and . To be specific, the space and norm are defined as follows
[TABLE]
The space is similarly defined by replacing the space in the above definition with the space. The corresponding models proposed by Meyer are stated as follows
[TABLE]
For mathematical analysis of these models, we refer the readers to [59, 62, 70]. In [59], the space is also generalized to any homogeneous Besov space , where and . However, Meyer’s models are hard to solve numerically. There are mainly two approaches to numerically solve the model with norm. The first approach is approximating in the definition of by [90]. Osher et al. [77] proposed an equivalent formulation called OSV when . In a word, OSV uses the square of norm instead of norm. To be specific, the OSV model solves
[TABLE]
The other approach called model is proposed by Aujol et al. [9, 10], replacing the norm with the indicator function of balls in the space . In other words, it solves the following problem
[TABLE]
where denotes the indicator function whose definition will be given in section 2. It is shown that this model gives the solution to Meyer’s model eq. 2 with when the parameter is appropriately chosen. In practice, they use a Moreau-Yosida type approximation and solve the following problem instead
[TABLE]
This regularized model converges to eq. 3 as the parameter approaches zero. Moreover, it is easy to implement using Chambolle’s projection method [37]. Similarly, in [11], the indicator function of the ball is used to replace the norm, which provides a similar numerical implementation approach to the Meyer’s model eq. 2 with .
In the above models, an image is decomposed into a geometrical part and an oscillating part. However, for a noisy image, the oscillating part may contain both the texture in the original image and the noise. To split these two parts, a model is proposed in [11], which constrains the norm of the texture part and the norm of the noisy part. Later, Gilles [60] modified the model with a coefficient assigned to each pixel to smoothly indicate whether it is in texture or noise. He also modified the model by requiring the norm of the noise to be much smaller than the norm of the texture. In [15, 53, 54], the authors extended some of the abovementioned models, which are originally proposed for gray-scale images, to color images. Besides, there are many other functions used in image decomposition. For example, the norm [5, 14, 42, 75] is used to promote sparsity or remove salt and pepper noise. In [13, 14], the quadratic form , where is a linear symmetric positive operator, is used for adaptive kernel selection of the texture component. Note that this quadratic form generalizes the term in ROF and the term in OSV.
The previous work [49] clarifies the relationship between single-time HJ equations and decomposition models with two terms (i.e. in eq. 1), such as the ROF model, Meyer’s models and some of their variations. However, as mentioned above, there are many other models handling three or more components. Also, in practice, one may modify a model by adding a quadratic term for numerical consideration, such as in eq. 4. This kind of modification is applied to most of the above models. As a result, the objective function in the numerical implementation actually contains three or more terms. On the other hand, new models can be constructed by regarding the functions mentioned above as building blocks and combining them together. For instance, the morphological component analysis [58, 85, 86] combines ROF model and minimization for the coefficients with respect to two sets of dictionaries chosen for the representation of texture and geometry. Another example is [45], which adds a higher order term to the models introduced above, in order to reduce the staircase effect. Actually, the higher order terms in image processing are widely studied in the literature. Two important models are the TV-TV2 infimal convolution model [41] and the Total Generalized Variation (TGV) model [25]. In fact, after discretization, the higher order linear operators are discretized using some matrices. In other words, the results in this paper can be applied to the discrete models with higher order terms by regarding them as matrix multiplication. In conclusion, it is valuable to generalize the previous work [49] and provide a framework to analyze the models involving more than two components. Also, our proposed framework is suitable for a large class of discrete decomposition models in imaging sciences, even including some models containing higher order terms.
Now, we briefly introduce the intuition and the basic setup for our framework and demonstrate the idea using some experimental results of the discrete model. In general, for a discrete decomposition model eq. 1, an image is regarded as a vector , where is the number of pixels. If we can relate each , , to a Hamiltonian and to an initial function, then the minimal value, regarded as a function of the input data and the parameters , relates to the solution of the corresponding multi-time HJ equation. Here, the parameters are regarded as time variables.
For example, the discrete model solves the following optimization problem:
[TABLE]
The desired quantities are the minimizers, denoted as and . Here, the discrete total variation semi-norm is defined as follows
[TABLE]
In this paper, we identify the space containing all matrices with rows and columns with the Euclidean space where . The discrete total variation defined above is the anisotropic version, which will be used in this paper. Its Legendre transform is the indicator function of the unit ball in the dual space. To be specific, let be the dual norm of , which is given by
[TABLE]
Then, we have for any where denotes the indicator function. Notice that any indicator function is invariant under multiplication with a positive constant, then we have . Hence, the above optimization problem is equivalent to
[TABLE]
We shall see that such a representation for will allow us to show that satisfies the following multi-time HJ equation
[TABLE]
In figs. 1, 2, 3, 4, 5, and 6, the minimizers and the minimal values for the corresponding input images are shown. To compute the minimizers, we apply a splitting algorithm to convert the optimization problem (5) to two subproblems involving computing the proximal point of and computing the projection to a ball of Meyer’s norm. The second subproblem is the dual problem to the first one. As a result, for both subproblems, we can apply the algorithm in [39, 50, 67] to obtain the exact minimizers.
In the first example, the test image is shown in fig. 1a. We consider the following parameters . The corresponding minimizers and are shown in figs. 1b and 1c. When , are fixed, the minimal values can be regarded as a function of , whose graph is plotted in fig. 2b. Similarly, the graph of is plotted in fig. 2c. To illustrate the variation of with respect to , we choose another image with corresponding suitable parameters , , and plot the function values with . In this example, is chosen to be a rotation of , and the parameters remain the same: , . The graph of is plotted in fig. 2a. We also show an example of the mixed image for and the corresponding minimizers in figs. 1d, 1e, and 1f. In addition, the model (with parameters ) is applied to a noisy image shown in fig. 3a, whose minimizers are shown in figs. 3b and 3c.
The test image “Barbara” is used in the second example. The original image and the corresponding minimizers in the model with parameters are shown in fig. 4. To demonstrate the variations of the minimal values, we choose two parts of the image, shown in figs. 5a and 5d, and repeat the experiment in the first example. Setting , , , and , the corresponding minimizers are shown in figs. 5b, 5c, 5e, and 5f. The mixed image () and minimizers are shown in figs. 5g, 5h, and 5i, and the dependence of on is shown in figs. 6a, 6b, and 6c.
It can be seen from figs. 2 and 6 that is a convex function with respect to the input image and the parameters. This can be proved with a similar argument as in the proof of 3.1. In this paper, more properties about and the minimizers are revealed.
Our contribution. The contribution of this paper is the theoretical results connecting the multi-time HJ equation and some optimization models such as decomposition models in imaging sciences. There are three parts in this paper. In the first part, we consider the decomposition models and the corresponding dual problems, and investigate the properties of their optimizers and optimal values. To be specific, for some optimization problems, the minimal value coincides with the solution to a corresponding multi-time HJ equation. This relationship in the case of single-time HJ equations has been studied in [49]. We generalize the representation formula for the minimizer and the variational analysis results of and in [49] to the case of multi-time HJ equations. Moreover, we present a new variational analysis of the scaled minimizer . In the variational analysis, we consider a sequence , whose elements are perturbed variables near the point and the perturbation becomes smaller when is larger. We show that the limits of the corresponding spatial gradients and the scaled minimizers solve two optimization problems which are dual to each other. In the second part, we prove the uniqueness of the convex solution to the multi-time HJ equation under some specific assumptions. In the field of PDEs, the uniqueness of the viscosity solution has been widely studied, for which we refer the readers to [48] and the references listed there. Here, our contribution is to provide a new perspective from convex analysis and use the duality technique to prove the uniqueness of the convex solution. At last, we propose a regularization method for the decomposition problems which may have non-unique minimizers or non-differentiable minimal values. The regularization method is used to select a unique minimizer and a unique gradient of the minimal function where and are some positive parameters. In fact, the gradient coincides with the maximizer in the corresponding dual problem. This regularization method can be regarded as a generalization of the Moreau-Yosida approximation, which is introduced, for example, in [8, 27]. Instead of only considering the primal problem as in the Moreau-Yosida approximation, our contribution here is to consider both the primal problem and the dual problem at the same time. Then, we apply the variational analysis result in the first part to prove the convergence of and . We show that they converge to the -projection of zero onto the corresponding sets of the original problems, when the regularization parameters and approach zero in a comparable rate.
**Organization of the paper. **The paper is organized as follows. Section 2 gives a brief review of the convex optimization theorems which are used in the later proofs. The main results are stated in sections 3, 4, and 5. In section 3, the connection between some decomposition models and the multi-time HJ equation is shown. 3.2 provides the representation formula for the minimizers of some decomposition models. Also, we investigate the variational behaviors of the minimal value , the momentum and the velocities in 3.4. Section 4 is devoted to the proof of the uniqueness of the convex solution to the multi-time HJ equation. In section 5, we present a regularization method for the degenerate cases which do not satisfy the assumptions in section 3. The method is demonstrated using a specific example but the analysis can be easily applied to other models. Finally, some conclusions are drawn in section 6.
2. Mathematical Background
In this section, several basic definitions and theorems in convex analysis are reviewed. All the results and notations can be found in [65, 66]. We also refer the readers to [22, 24, 81].
First, a set in is convex if whenever and . The relative interior of , denoted as , is the interior of with respect to the minimal hyperplane containing in . For any convex set , the normal cone of at , denoted by , can be characterized by
[TABLE]
Here, we use the angle bracket to denote the inner product operator in any Euclidean space . For any closed convex set and any point , one can define the asymptotic cone of , denoted as , by
[TABLE]
In fact, the asymptotic cone is independent of , as stated in the following result.
Proposition 2.1**.**
[65*, Prop.III.2.2.1]**
Let be a closed convex set and . Then . In other words, for any , for any .*
A function is said to be convex if for any and any ,
[TABLE]
The function is called proper if it is not identically equal to . The domain of , denoted by , is defined to be the set where does not take the value . The epigraph of , denoted as , is defined by:
[TABLE]
Then, is convex (proper, or lower semi-continuous, respectively) if and only if is convex (non-empty, or closed, respectively). We denote to be the set of proper, convex and lower semi-continuous (l.s.c) functions from to . In this section, we only consider the functions in . These functions have good continuity properties, which are stated below.
Proposition 2.2**.**
[65*, Lem.IV.3.1.1 and Chap.I.3.1 - 3.2]**
Let . If , then is continuous at in . If , then for any ,*
[TABLE]
For any and , the directional derivative at along any direction , denoted as , is well-defined in . When is differentiable at , is a linear function. In general, when is not differentiable, is only sublinear, in which case we can consider the linear functions dominated by it. Each normal vector of such linear functions gives a subgradient of at , whose formal definition is given below. Also, the rigorous statement about the relation we described above between the directional derivatives and subgradients is given in 2.6.
A vector is called a subgradient of at if it satisfies
[TABLE]
The collection of all such subgradients is called the subdifferential of at , denoted as . It is easy to check that if and only if is a minimizer of . As a result, one can check whether is a minimizer by computing the subdifferential.
As is well known, the subdifferential operator is a (maximal) monotone operator. To be specific,
[TABLE]
Moreover, in most cases, the subdifferential operator commutes with summation.
Proposition 2.3**.**
[66*, Cor.XI.3.1.2]**
Let . Assume . Then for any .*
Here, we give one simple example. For any convex set , the indicator function is defined by
[TABLE]
In this paper, we also use the notation to denote the indicator function if the set is given in the form of some constraints. By definition, the indicator function remains the same after multiplying by a positive constant, i.e. we have for any . One can compute the subdifferential of the indicator function and obtain
[TABLE]
Next, we introduce one important transform in convex analysis called Legendre transform. For any function , the Legendre transform of , denoted as , is defined by
[TABLE]
Legendre transform gives a duality relationship between and . In other words, if , then and . Similarly, along with this duality relationship, some properties are dual to others, as stated in the following proposition. (Here and after, a function is called 1-coercive if .)
Proposition 2.4**.**
[66*, Chap.X.4.1]**
Let . Then is finite-valued if and only if is 1-coercive. Also, is differentiable if and only if is strictly convex.*
In particular, the subgradients can be characterized by the maximizers in eq. 11.
Proposition 2.5**.**
[66*, Cor.X.1.4.4]**
Let and . Then if and only if , if and only if .*
The concepts we introduced above, including directional derivatives, subgradients and Legendre transform, can be linked all together by the following proposition.
Proposition 2.6**.**
[66*, Example X.2.4.3]**
Let and such that is nonempty, then . Moreover, if , then , hence .*
Except from Legendre transform, there is another operator to construct convex functions called inf-convolution. Given two functions , assume there exists an affine function such that and for any . Then, the inf-convolution between and , denoted as , is a convex function taking values in . The definition of the inf-convolution is given by
[TABLE]
In the following proposition, the relation between Legendre transform and inf-convolution is stated. Actually, the Hopf formula and Lax formula introduced in the next section are formulated using Legendre transform and inf-convolution operator, respectively. As a result, these two operators play a significant role in our analysis in this paper.
Proposition 2.7**.**
[66*, Thm.X.2.3.2 and Thm.XI.3.4.1]**
Let . Assume the intersection of and is non-empty. Then and . Moreover, for any , the optimization problem eq. 12 has at least one minimizer, and for any minimizer .*
3. Properties of the Solutions to the Multi-time Hamilton-Jacobi Equations
In this section, we provide a representation formula for the minimizers in the Lax formula and highlight the relation of the minimizers and the momentum in the multi-time HJ equation. Also, we investigate the variational behaviors of both the solution to the multi-time HJ equation and the corresponding momentum when time variables approach zero. Moreover, we also present a new result stating the variational behaviors of the velocities, which has not been developed before, even for the single-time case. Similar to the duality relation of the Hopf and Lax formulas, the cluster points of the minimizers and momentum solve two optimization problems, which are also dual to each other. An illustration is given in the upper part of fig. 7.
We consider the solution to the following multi-time HJ equation
[TABLE]
Here, we only consider the multi-time HJ equations whose Hamiltonians only depend on the momentum . Several conditions are imposed on the Hamiltonians and the initial data in this section. To be specific, we assume
- (H1)
, is convex and 1-coercive for any . Moreover, at least one of them is strictly convex;
- (H2)
.
From the assumption (H1), by 2.4, it is known that is also finite-valued, convex and 1-coercive for any . Moreover, at least one is differentiable.
It is well known that in this case the unique classical solution is given by the Hopf formula [71, 88] stated as follows
[TABLE]
and the Lax formula [88] stated as follows
[TABLE]
for any and . We extend and to the whole domain by simply setting the function values to whenever the function value is not defined. There are some physical interpretations of the HJ PDEs and the optimizers in the above two formulas. Given suitable Hamiltonians and a suitable initial condition , the HJ PDE eq. 13 describes the movement of a particle. Roughly speaking, in a time interval with length , a particle moves along the characteristic line of the th equation in the PDE system. The velocity in this time interval equals where denotes the minimizer in the Lax formula eq. 15. On the other hand, the maximizer in the Hopf formula eq. 14 gives the momentum of the particle, which coincides with the spatial gradient . We refer the reader to [21] for details about HJ PDEs and variational principles in physics.
Under the assumptions (H1) and (H2), , and the value is finite if there exists some . In addition, the minimizers in the Lax formula eq. 15 exist whenever the minimal value is finite. This result can be proved using 2.7. Also, by 2.5, it is not hard to check and satisfies HJ equation eq. 13. Moreover, the spatial gradient is the unique maximizer in the Hopf formula eq. 14. To conclude, the Hopf and Lax formulas express the classical solution to the multi-time HJ equation as two optimization problems. The Hopf formula provides a physical interpretation and has the momentum as the maximizer, while its dual problem in the Lax formula is in the same form as some decomposition models in imaging sciences.
The following proposition states that the solution is actually a convex function, hence the techniques in convex analysis can be applied to analyze the solution. The results hold even under weaker assumptions. Actually, a part of the proposition can be further generalized to the case when and for any .
Proposition 3.1**.**
Let and for any . Then, , whose Legendre transform is given by
[TABLE]
for any and . Here, denotes the indicator function. Moreover, if the assumptions (H1)-(H2) are satisfied, then is finite for any and which are not all zero.
Proof.
First, we prove that is the Legendre transform of , where is defined by
[TABLE]
for any and any . It is easy to check .
By definition, for any and ,
[TABLE]
First, we consider the case when there exists such that . Take . For any , take , which is a finite value. From the above equation,
[TABLE]
Hence if for some .
Then, consider the case when . Let , from eq. 16, we obtain
[TABLE]
Therefore, , which implies is a convex lower semi-continuous function and . Moreover, if there exists some such that and for any , then, by assumption (H1), we deduce that is 1-coercive, which, by 2.4, implies its Legendre transform (with respect to x) is finite-valued. ∎
By investigating on the boundary of the domain, the solution to a lower time dimensional equation is embedded in the solution to the higher time dimensional equation, in the sense that the restriction of on the subspace for any index set is the solution to the corresponding lower time dimensional HJ equation with Hamiltonians .
The following proposition states a representation formula for the minimizers in the Lax formula. In the decomposition model eq. 15, a given image is decomposed into different components including and the residual . However, sometimes the primal minimization problem is difficult to solve, then the following proposition can be applied to compute using the momentum . In fact, the momentum is the maximizer of the dual problem in the Hopf formula eq. 14. In other words, the following proposition gives the relation of the optimizers in the primal decomposition problem and the dual problem.
Proposition 3.2**.**
Suppose the assumptions (H1)-(H2) hold. Let and assume the time variables are not all zero. Denote to be any minimizer of the minimization problem in eq. 15 with parameters and . Here, each can be regarded as a function of . Then, for any ,
[TABLE]
Specifically, if a stronger assumption is imposed, say, all the Hamiltonians are differentiable, then the minimizer is unique and satisfies
[TABLE]
Proof.
Since for each , by 2.7 and induction, the minimizers exist if , and
[TABLE]
From the assumption (H1), there exists some such that is differentiable, hence the intersection above contains at most one element. On the other hand, is non-empty in the interior of the domain of , which is the whole space because is finite-valued when the time variables are not all zero. Therefore, the above intersection contains exactly one element. In other words, is differentiable with respect to for any which are not all zero and . Moreover, by eq. 19, , which implies for any . ∎
In the remaining part of this section, we investigate the multi-time HJ equation eq. 13 and the minimization problem eq. 15 in a variational point of view. To be specific, let and for any and such that they satisfy and for any . Let and for any . We are interested in the convergence behavior of the momentum and the minimizers evaluated at . We will demonstrate one application in section 5.
Among all the sequences , by taking subsequences, we can assume there is a sequence with the lowest convergence rate. According to the symmetry of the time variables, without loss of generality, we can assume is the slowest sequence converging to zero compared to for any , i.e., we assume that has a finite limit denoted as for any . In summary, the following notations and assumptions are adopted:
[TABLE]
In the decomposition models, is given by a sequence of observed images. In each there is a constant component denoted by and several other components denoted by for . In the remaining part of this section, we investigate the behavior of the minimizers of the decomposition model in eq. 15 when the components converge to zero and the parameters in the model vanish.
First, we show the convergence of to zero , which is stated in (i) in the following proposition. In other words, the decomposition model recovers the constant component when the other components and the parameters in the model converge to zero. Then, (ii) and (iii) in the following proposition are technical results about the convergence rate, which will be used in later proofs.
Proposition 3.3**.**
Assume (H1)-(H2) and eq. 20 hold. Let be any minimizer of the minimization problem in eq. 15. Let . Then,
- (i)
For any ,
[TABLE]
- (ii)
If and , then
[TABLE]
- (iii)
If and , then the sequence is bounded.
Proof.
Denote for any , and . Define . Recall that for each , and are two sequences satisfying and , respectively. And the spatial variable is defined to be .
Proof of (i): By Lax formula eq. 15,
[TABLE]
Since is a convex function, there exists such that . Let . Then, using the convexity of and Cauchy-Schwarz inequality, we get
[TABLE]
Combining eq. 22 and eq. 23, we get
[TABLE]
For any , since is not bounded, without loss of generality, by taking subsequences, we can assume increases to infinity. Since is 1-coercive, for any , there exists such that for any , . Together with eq. 24, we get
[TABLE]
Since and are bounded, and is continuous in for any , then the right hand side is bounded. However, can be arbitrarily large, then the boundedness of left hand side (deduced by the boundedness of the right hand side) implies for any . If , then is bounded by the definition of , hence also converges to zero.
Proof of (ii): We can apply the same argument as above and set , because . From eq. 25, using the definition of in eq. 20 and triangle inequality, we have
[TABLE]
Dividing both sides by , we can obtain
[TABLE]
With the same argument as in the proof of (i), we deduce that the right hand side is bounded, while can be arbitrarily large. Therefore, converges to zero for any . If and , then is bounded by the definition of and converges to zero by the definition of , hence also converges to zero.
Proof of (iii): It suffices to prove the contrapositive statement. To be specific, let , i.e. is unbounded, it suffices to prove . In the proof of (ii), we know that converges to zero if . Then, the unboundedness of implies that converges to [math], hence and (iii) is proved. ∎
Similarly, we also consider the maximizers in the dual problem eq. 14 with the observed data and the parameters . The following lemma states the boundedness of the maximizers which will be used in the later proofs.
Lemma 3.1**.**
Under the assumptions (H1)-(H2) and eq. 20, for any such that , the sequence is bounded and any cluster point is in .
Proof.
Recall that for each , and are two sequences satisfying the assumptions in eq. 20. Denote . Then, is a maximizer of the maximization problem in eq. 14. Hence, for any in ,
[TABLE]
Since , we have , hence . Combining this inequality and the above one we can obtain
[TABLE]
Here, for the second inequality above, we used the definition of in eq. 20 and Cauchy-Schwarz inequality. Then, rearranging the terms and dividing by , we get
[TABLE]
If is not bounded, without loss of generality, we can assume increases to infinity. Since is 1-coercive for all , then for any , there exists such that for any and any . Then, from eq. 26, for any , we obtain
[TABLE]
The right hand side is bounded. However, since goes to infinity, the term for on the left hand side is unbounded, while the terms for is non-negative. As a result, the left hand side can be arbitrarily large, which leads to a contradiction. Therefore, we can conclude that is bounded.
For the remaining part, let be a cluster point, then there exists a subsequence converging to , still denoted as . Since solves the multi-time HJ equation eq. 13 and is continuous for any , then we have
[TABLE]
By the continuity property [66, Prop.XI.4.1.1] of the subdifferential operator of the convex lower semi-continuous function , we can conclude that
[TABLE]
which implies . ∎
The variational behaviors of the momentum and the velocities are presented in the following proposition. To be specific, the cluster points of the momenta and the velocities solve two optimization problems, respectively, and the two problems are dual to each other. An illustration of this result is given in fig. 7.
Proposition 3.4**.**
Assume (H1)-(H2) and eq. 20 hold. Let and . Then,
- (i)
the directional derivative of corresponds to a maximization problem:
[TABLE]
Moreover, let be any cluster point of , then,
[TABLE]
- (ii)
the directional derivative of corresponds to the dual minimization problem:
[TABLE]
Moreover, if is a cluster point of for any satisfying , then
[TABLE]
Specially, if is strictly convex and for some , then the maximizer in eq. 28 is unique, which implies the convergence of to the unique maximizer. Similarly, for any such that is differentiable and , we can conclude that converges to the unique minimizer in eq. 30.
Remark 3.1**.**
It is straightforward to obtain using the following computation
[TABLE]
where the last equality follows from the assumption that for any .
Proof.
Recall that the spatial variable is defined to be , where and are two sequences satisfying the assumptions in eq. 20. Denote .
Proof of (i): For any , by Hopf formula eq. 14, we obtain
[TABLE]
Since , we have . Hence, together with the definition of in eq. 20, we get
[TABLE]
Therefore, we have
[TABLE]
where we recall that and by eq. 20. Here, is an arbitrary element in , hence we obtain
[TABLE]
On the other hand, for any , consider the function defined by , where . Since is a convex function and is its restriction on a line, then with . Also, is differentiable in since is differentiable. The derivative of at is given by the chain rule:
[TABLE]
Since satisfies the multi-time HJ equation eq. 13, we obtain
[TABLE]
From straightforward computation and the convexity of , we get
[TABLE]
where .
Let be a cluster point of . Take a subsequence converging to and still denote it as . Since by 3.1 and is continuous for any , we have
[TABLE]
Together with eq. 31, the equation eq. 27 is proved. Moreover, any cluster point is a maximizer.
Proof of (ii): Here, we adopt the notations and defined in the proof of 3.3 to represent the minimizers in the Lax formula. According to the Lax formula eq. 15 evaluated at the point and by the convexity of we deduce that
[TABLE]
for any . Since , we have . By the definition of and , we can compute , hence we have
[TABLE]
where . According to 3.2 we have . Therefore we get
[TABLE]
Combining the above two equations we obtain
[TABLE]
From 3.3 (ii), converges to zero if . Also, are bounded by 3.1, hence the first sum in the right hand side of eq. 33 converges to zero as approaches infinity. On the other hand, for such that , is bounded by 3.3 (iii). Taking a subsequence, we can assume that converges to some vector, denoted as . In conclusion, as approaches infinity in eq. 33, we have
[TABLE]
where the second inequality holds by the definition of Legendre transform eq. 11. From eq. 27, for any maximizer in eq. 28,
[TABLE]
Taking in eq. 34 and comparing it with eq. 35, we can conclude that the inequalities in eq. 34 become equalities when . As a result, when we have , which implies that . Then, we deduce that
[TABLE]
On the other hand, for an arbitrary , by eq. 34 and eq. 36, we have
[TABLE]
which implies that for any , when . By eq. 7 and eq. 10, we can deduce that . 2.5 gives the equality . Then, eq. 29 follows from this equality and eq. 36.
It remains to prove eq. 30. Consider any such that . Define by . Then it suffices to prove . So far, we have proved and , which implies . By straightforward computation and 2.3,
[TABLE]
Therefore, is a minimizer of , which concludes the proof. ∎
The above proposition provides the explicit formulas for the variations of , and where denotes the -th component of the minimizer of the decomposition model in the form of eq. 15. Specifically, the limits of these quantities are related to the two optimization problems given by eqs. 28 and 30. From the perspective of image processing, given an observed image which is a summation of a constant component and other components , the decomposition model eq. 15 gives components. In these components, one component converges to the constant component and the other components vanish as the parameters approach zero, by 3.3. Then, 3.4(ii) states that the component converges to [math] from a direction [82, p. 197]. On the other hand, 3.4(i) provides a representation formula for the cluster point of the maximizers of the dual problem in the form of eq. 14.
4. Uniqueness of the Convex Solutions to the Multi-time Hamilton-Jacobi Equations
In the previous section, we have discussed the relation of the optimization problems in the Hopf formula and Lax formula with the classical solution of the multi-time HJ equation. In fact, some results can be generalized to weaker assumptions in which case the solution provided by Hopf and Lax formulas is not classical. In this section, we prove that the only convex solution is given by the two formulas.
In the field of PDEs, a type of solution called viscosity solution is considered for solving the HJ equation when no classical solution exists. The uniqueness of the viscosity solution has been widely studied under different assumptions [17, 19]. However, the functions in convex analysis and optimization may take the value , which is an unusual condition in the field of PDEs. Therefore, to maintain the connection of the HJ equations and convex optimization problems, we consider the convex solution which may be infinity in some area and prove the uniqueness using the techniques in convex analysis.
We start with the proof for the classical convex solution, in order to demonstrate the idea of utilizing the convexity assumptions. After that, we state the uniqueness of nonsmooth convex solution under more general assumptions in 4.1. When proving the uniqueness of the classical convex solution, we assume the properties (H1) and (H2) hold. Moreover, the solution satisfies:
- (S1)
;
- (S2)
solves the multi-time Hamilton-Jacobi equation eq. 13.
As it is discussed in section 3, defined in the Hopf formula eq. 14 is a solution satisfying the assumptions (S1) and (S2). Hence, we just need to prove for any satisfying (S1)-(S2). First, we consider the single-time case when the time dimension , and formulate its Legendre transform for and in the following lemma.
Lemma 4.1**.**
Assume (H1)-(H2) hold and satisfies (S1)-(S2). Let . Then there exists a convex function , such that , where .
Proof.
In this proof, we only consider the single-time HJ equation. For the single-time case, is used to denote the Hamiltonian, instead of , for simplicity. First, consider the domain of . For each , define
[TABLE]
For the illustration of this definition, see fig. 8a. The function defined here is an extended-valued function taking values in . In the last step of this proof, we will show the convexity and specify the range of this function. From this definition, it is obvious that , where , as defined in the statement of this lemma. Moreover, denote , then we prove by using the monotonicity of . To be specific, let and , then, we have
[TABLE]
Hence, is non-decreasing with respect to . As a result, implies . Therefore we obtain .
In the next step, we prove .
Denote (see fig. 8a). Here and after in this section, we use the bold character to denote the zero vector in . Since is the projection of along the direction , is a convex set. Let . Take , then because . Let , which implies . If , then and . Since satisfies the HJ equation eq. 13, . In other words, if with , then we can conclude that . Therefore, for any and , by 2.6, the directional derivative of in the direction is:
[TABLE]
As a result, is a constant function in its domain. Denote this value as . By the continuity of when restricting to the straight line , the value is also if is finite. Hence, for any and .
Now, we consider the case when . For the illustration, see fig. 8b. Let . Take and , then by 2.2,
[TABLE]
Hence, the value of does not depend on if . Denote this value as . By continuity, if is finite. Therefore, we have proved that the domain of coincides with the set and in the domain of .
Then, we prove when restricting to . By setting if , we can regard as a function from to . It is not hard to check the convexity of . To be specific, for any and , choose and (see fig. 8c), then we have
[TABLE]
Hence is a convex function taking values in . Also, for each , we have
[TABLE]
Therefore, , which implies and if . Moreover, according to 2.2 and eq. 39, we deduce that
[TABLE]
for any and . As a result we have in the domain of definition. In conclusion, we get the following formula for
[TABLE]
The final part is to prove that is a convex function taking values in .
First, we prove that cannot take the value by contradiction. Suppose there exists such that equals . Then, by definition of we have . Together with the formula of in eq. 40, we derive
[TABLE]
Therefore, and are in the asymptotic cone of by definition eq. 8. Then, by 2.1, for any , we obtain
[TABLE]
which implies . Since is an arbitrary vector in , we deduce that . Moreover, according to eq. 40, the function is a constant on the line for any , which implies that the directional derivative of in the direction is zero. In other words, we have
[TABLE]
On the other hand, consider any and such that is nonempty. Let . This implies . Hence, according to 2.6, we get
[TABLE]
which contradicts eq. 41. Therefore, cannot take the value .
At last, the convexity of follows from the convexity of . In fact, , which is a reflection of the convex set , hence it is also convex. Therefore, is a convex function from to . ∎
Based on this lemma, the following proposition states the uniqueness result. It can be easily seen in the above lemma that the Legendre transform of has a similar form as . Actually, the following proposition is proved by equating the two functions and .
Proposition 4.1**.**
The solution to the multi-time Hamilton-Jacobi equation is unique. Specifically, under the assumptions (H1) and (H2), if satisfies (S1)-(S2), then .
Proof.
In the proof of this proposition, we first consider the case of single-time. Let , and be the Hamiltonian.
From 4.1, it is proved that , where and is a convex function whose domain is the projection of along . Moreover, (note that the domains of and are the same).
First, we prove that for any by contradiction. Assume there exists such that . Let . Then, by 2.3 and eq. 10, we deduce that
[TABLE]
where the last equality holds because is the reflection of . Here denotes the normal cone of the set at . Let , and . Denote . Then, by eq. 42 we have , which implies . However, , hence the HJ equation eq. 13 does not hold at , which is a contradiction. Therefore, when restricting to the relative interior of the domain of , which implies
[TABLE]
for any .
Actually, the values of any convex lower semi-continuous function on the relative boundary of its domain is fully determined by the values in the relative interior. It is not hard to check that
[TABLE]
Hence, we have proved that and agree in the relative interior of the domain. Therefore, in the whole domain, which implies and gives the uniqueness of the convex solution to the single-time HJ equation.
Then, we can consider the case of multi-time. Now, we assume . It suffices to prove and coincide for any and any . Let be arbitrary positive real numbers and denote . Define for any and . Then . We can compute the gradient of with respect to for any and using chain rule and the assumption that satisfies the multi-time HJ equation eq. 13 to obtain
[TABLE]
It is easy to check that satisfies the initial condition given by , i.e. for any . Hence, is a solution to the single-time HJ equation with Hamiltonian , which is finite-valued, 1-coercive and strictly convex. Therefore, for the single-time HJ equation, the conditions (H1)-(H2) and (S1)-(S2) are satisfied. Then, the solution is unique and equal to the Hopf formula with respect to the Hamiltonian . Hence, for any and any , we have
[TABLE]
Therefore, in the relative interior of the domain, which implies in the whole space, because of the lower semi-continuity of and . The uniqueness of the solution to the multi-time HJ equation follows. ∎
One can actually apply the above arguments to weaker assumptions and obtain a generalized result, which is stated in the following corollary. In this generalized result, it is possible that the solution is not a classical solution, hence the subgradients of , instead of the gradients, are assumed to satisfy the HJ equation, which is a natural generalization of the classical solution when we want to consider the solution which is convex and lower semi-continuous.
Corollary 4.1**.**
Let , and be arbitrary extended-valued functions defined on . Assume there exists a function satisfying:
- (i)
If and satisfy for some and , then for any .
- (ii)
* for any .*
Then, the following statements hold:
For the case of single time, i.e. , denote to be the Hamiltonian. If there exists , such that , then is unique and , where is defined by
[TABLE]
for any and . Moreover, the restriction of on is finite-valued and convex.
- 2.
For the multi-time case, i.e. , if is another function satisfying the assumptions (i)-(ii) with , then . In other words, the solution is unique when the relative interior of the domain is given.
Proof.
The proof of this corollary is similar to the proof of 4.1, so we just give a brief sketch here. First, we adjust the proof of 4.1 by changing the gradients of to the subgradients of . The argument still holds because we assume in (i) that the subgradients of satisfy the HJ equation. Then, we draw the same conclusion as in 4.1. In other words, with the function defined in eq. 37, we have
[TABLE]
Also, the part of in the proof of 4.1 still holds. So we derive that the two functions and coincide in the relative interior of . Together with eq. 44, we derive eq. 43, and hence the first statement in this corollary follows.
For the case when , it suffices to prove that and coincide in the relative interior of the domain. Let be an arbitrary point in . It remains to prove that and are equal at the point . Notice that we have for any , then we can choose the positive number in the proof of 4.1 to be for any . As in the proof of 4.1, we define the functions and by
[TABLE]
for any and . Since there exists a point in the relative interior of , one can easily check that the assumptions in [66, Thm.XI.3.2.1] hold. Then, by [66, Thm.XI.3.2.1], the chain rule for the subgradients of holds. Similarly, the chain rule also holds for the subgradients of . Therefore, the argument in the proof of 4.1 in the multi-time case remains valid by changing the gradients to the subgradients. As a result, we conclude that both and solve the single-time HJ equation with the Hamiltonian . Then, by the first statement in this corollary, we have , which implies that and coincide at the point , and the proof is complete. ∎
5. A Regularization Method for the Degenerate Cases
In the previous two sections, we discussed the relation between some optimization problems and the multi-time HJ equations under the assumptions (H1) and (H2). In general, if those assumptions are not satisfied, some results may collapse. For example, if there is no strictly convex Hamiltonian, then the solution may be non-differentiable, which leads to the non-uniqueness of the maximizer (called momentum) in the Hopf formula eq. 14. Also, the minimizer in the Lax formula eq. 15 may be non-unique if the Hamiltonians are not differentiable. However, these are two common situations for optimization problems such as the decomposition models. In fact, any norm or indicator function is neither strictly convex nor differentiable. As a result, it is an important problem to select a meaningful momentum or minimizer in the solution set when it contains more than one element.
In this section, we propose a regularization method to select a unique momentum and a unique minimizer simultaneously, and provide the representation formulas for both selected quantities by using the results stated in the previous sections. Intuitively, to select a minimizer , we modify the degenerate term by adding to it where is a positive parameter and is a differentiable function satisfying (H1). When approaches zero, the minimizer of the modified problem will converge to the unique minimizer in the solution set of the original problem which minimizes the function . The procedure to select is the same except performing the inf-convolution with to the degenerate term instead of the addition of .
In the literature, the special case selecting the momentum using inf-convolution with is well-known as Moreau-Yosida approximation, which is introduced, for instance, in [8, Thm.2, p.144] and [27, Thm.3.1, p.54]. Generally, a Moreau-Yosida based regularization method usually selects a unique minimizer only or a momentum only, but not both. Our contribution here is that we consider the primal problem and the dual problem simultaneously. In other words, one can select the momentum and the minimizer at the same time using our method. This analysis can be adapted easily to other decomposition models with more degenerate terms. Moreover, one can also use the same procedure with other function or even use two different functions in the two added terms. One alternative choice is for any , for example. In fact, if is chosen to be any non-negative, finite-valued, 1-coercive, differentiable and strictly convex function, the statements in this section still hold. To be specific, the proofs of 5.1, 5.3 and 5.1 hold after subtle adjustment, and one can use subdifferential calculus to prove 5.2. In this paper, for simplicity, we mainly focus on the quadratic regularization terms, which are usually preferred in practice because of the simplicity and efficiency of numerical implementation.
Now, we focus on a specific decomposition model, and the regularization function is chosen to be . Some other models can be analyzed using similar arguments. Let and be two arbitrary norms whose dual norms are denoted as and . In fact, all the results remain valid if and are two semi-norms, in which case the corresponding dual norms and are finite in some subspaces and equal to otherwise. The set of minimizers is defined as follows
[TABLE]
We can regard the minimal value as a solution to the HJ equation given by the Lax formula with spatial variable and time variable and define
[TABLE]
Note that in the corresponding HJ equation, the initial function is and the Hamiltonian is , hence the assumption (H1) is not satisfied. As a result, we need to apply the regularization method in this example. For simplicity we also use , to denote these two norms, then F_{2}^{*}(y)=I\{{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|y\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{*}\leq{\color[rgb]{0,0,0}t}\}. We assume and drop the variable in the remainder of this section because the variation of is not considered in this problem. Then, we can rewrite the problem as the following
[TABLE]
In fact, there are in practice some useful models in the literature which can fit in this form. Now, we give two examples. In what follows, we use , and to denote the discrete total variation semi-norm, the discrete norm and the discrete norm, respectively. First, in [9, 10], it is shown that the Meyer’s model in the following form
[TABLE]
is equivalent to
[TABLE]
for some suitable positive parameter . In this example, both and are the discrete total variation because the discrete norm is the dual norm of . Similarly, another Meyer’s model stated as follows
[TABLE]
is equivalent to
[TABLE]
for some suitable positive parameter [11]. In this example, the functions and are the discrete total variation and the dual norm of the discrete norm, respectively.
As mentioned above, we apply two operators to the function and obtain its approximation
[TABLE]
where are small regularization parameters. Here, we choose to modify the function , but one may instead apply the operators to the function and the analysis is similar. Then, the problem reads
[TABLE]
We expand the inf-convolution to get
[TABLE]
Here and later in this section, we omit the variable when there is no ambiguity.
By introducing the quadratic terms, the uniqueness of and the differentiability of are guaranteed. When the parameters and converge to zero in a comparable rate, the reasonable minimizer and momentum are selected. In fact, they are the elements with the minimal norms in the target sets and . The detailed statements are listed as follows.
Lemma 5.1**.**
*For any , there is a unique minimizer to the problem eq. 48. Moreover, for any positive constant , the sets and are bounded. *
Proof.
It is easy to check that the objective function in eq. 48 is 1-coercive and strictly convex, because of the 1-coercivity and strict convexity of the quadratic terms. Therefore, there exists a unique minimizer .
Setting and in eq. 48 and comparing it with eq. 45, we obtain
[TABLE]
Denote C:=S(x)+\min_{v\in U(x)}\frac{{\color[rgb]{0,0,0}K}}{2}\|v\|_{2}^{2}, where is an arbitrary positive number as defined in the statement. Then is independent of and , and when . From this inequality and the definition of in eq. 48, we can derive a bound for that reads
[TABLE]
Therefore, is bounded by the constant when we assume .
Then, from the constraint given by the indicator function in the minimization problem eq. 48, we have , which implies the boundedness of because all the norms are equivalent in the finite-dimensional space . As a result, is also bounded whenever . Then the conclusion follows. ∎
Lemma 5.2**.**
Let and be defined by eq. 48. Then, we have Any cluster point of is also a cluster point of and vice versa. Moreover, any cluster point of and is in .
Proof.
The convergence of to follows from eq. 49. Since , any cluster point of is also a cluster point of and vice versa. It remains to show that any cluster point of is in .
By the definition of , we have
[TABLE]
where we first multiply the objective function by and then expand the quadratic term. Recall that any indicator function is invariant under multiplication with a positive constant, hence we obtain and the second equality in eq. 50 follows. The last maximization problem in eq. 50 is in the form of Hopf formula. The corresponding multi-time HJ equation with time variables and is given by
[TABLE]
Here, is the l.s.c. convex function such that . Although the assumption (H1) is not satisfied, by eq. 50 and 5.1, we know that the Hopf formula is well-defined in . Moreover, the solution is the classical solution to the multi-time HJ equation eq. 51 and its spatial gradient equals . To be specific, we have
[TABLE]
Then, we want to apply the results in 3.4 (i) to prove that any cluster point of is in . In fact, under the basic assumptions that and the Hopf formula is well-defined, the proof of 3.4 (i) only requires the following statements:
- (a)
is non-empty;
- (b)
the Hamiltonians are finite-valued;
- (c)
is differentiable;
- (d)
the spatial gradient is bounded with all limit points in .
The statements (b) and (c) are obvious satisfied. It is straightforward to check . Specifically, iff . By simple computation, . Then we obtain
[TABLE]
Such and always exist, hence . As for the statement (d), the boundedness of follows from eq. 52 and 5.1. By eq. 49, converges to . Also, is given by the constraint imposed by in the minimization problem eq. 48. Together with eq. 52, we can conclude that any limit point of , denoted as , satisfies and . Hence, by eq. 53 and the statement (d) is proved.
Therefore, the conclusion of 3.4 (i) still holds although the assumption (H1) is not satisfied. As a result, for any cluster point of ,
[TABLE]
where the last two equalities follow from eq. 53 and the definition of in eq. 45. In conclusion, any cluster point of is in . ∎
Lemma 5.3**.**
For any , the function defined in eq. 47 is differentiable. Let and define . Then for any positive constant , the set of gradients is bounded. Moreover, as and approach zero, any cluster point of is in .
Proof.
Rewriting the formula of in eq. 48, we get
[TABLE]
From straightforward computation, by 2.7 and the definition of in eq. 48, we obtain
[TABLE]
As a result, contains at most one element. On the other hand, is convex and finite-valued, which implies the subdifferential of is non-empty. Hence, is differentiable and its gradient is given by
[TABLE]
Let be an arbitrary positive number. Now, we prove that there exists a constant such that whenever . By eqs. 54 and 55, is in the set . On the one hand, the subdifferential of the norm is always bounded. In other words, there exists a constant such that whenever for some . Then, we deduce that the set is bounded by . On the other hand, according to 5.1, there exists a constant such that whenever . Therefore, is bounded by .
Let be a cluster point of . By taking a subsequence we can assume and converge to zero and converges to . By 5.1, is bounded, hence we can assume converges to a point by taking a subsequence. Then, converges to by 5.2. From eq. 54, we have
[TABLE]
Since the subdifferential operators and are continuous [66, Prop.XI.4.1.1], when goes to infinity, the above inclusion becomes
[TABLE]
On the other hand, by 2.7 and the definition of and in eq. 45, we have
[TABLE]
for any . Moreover, by 5.2, since is a cluster point of , we can conclude that . As a result, we can choose in eq. 57 and compare it with eq. 56 to conclude that . ∎
Proposition 5.1**.**
Assume and converge to zero and . Then, the minimizer and the gradient converge to the projections of zero onto the sets and , respectively. To be specific,
[TABLE]
Proof.
Define . We will use the general symbol to replace the quadratic function because this proof holds for a general finite-valued, 1-coercive, differentiable and strictly convex function .
Note that the limit of is the same as the limit of , hence we just need to prove the result for and . Denote
[TABLE]
Since and are bounded, we can assume that converges to and converges to by taking a subsequence. Then it suffices to prove , .
[TABLE]
By 2.5, we deduce that and . Together with eq. 59, we obtain
[TABLE]
On the other hand, since and are the minimizer and momentum of the original problem eq. 45, we have
[TABLE]
Combining eq. 60 and eq. 61, we obtain
[TABLE]
Since the subdifferential operators and are monotone, by eq. 9, we obtain
[TABLE]
We sum up the two inequalities to get
[TABLE]
We divide the above inequality by and take the limit to obtain
[TABLE]
where the positive constant is defined in the statement of this proposition to be . From 5.2 and 5.3, we know that and , hence we have and by eq. 58. Taken together with eq. 62, we obtain
[TABLE]
As a result, the inequalities in eq. 63 become equalities, which implies and because is positive by assumption. Therefore, we conclude that and , since the minimizers in eq. 62 are unique. ∎
In practice, if a model has non-unique minimizers, then some existing optimization algorithms may fail to converge, in which case one may consider this modification procedure and perform the optimization algorithm to the modified problem to obtain a sequence converging to the selected minimizer. Here, for simplicity, we only demonstrate the method on a specific optimization problem whose objective function contains two parts including one norm and one constraint. In fact, this method works for more general cases, such as some other decomposition models with more degenerate parts. Now, we give a numerical illustration for this proposed regularization method on the celebrated TVL1 model [5, 6, 14, 42, 43, 52, 74, 75].
To be specific, the TVL1 model solves the following optimization problem
[TABLE]
where denotes the discrete total variation semi-norm defined in eq. 6. However, it is well-known that this minimization problem may have non-unique minimizers [42, 50]. For instance, let be the domain of an image and be any small rectangle in such that . Let be the set of indices whose corresponding pixels are in . Let be the numbers of pixels on the two adjacent sides of the small rectangle . In other words, there are pixels in and pixels on the boundary of . Let and be two different real numbers in and set the discretized image as follows
[TABLE]
Then, the minimizers of the TVL1 model eq. 64 with are not unique. Moreover, we have
[TABLE]
where and are defined by
[TABLE]
By applying the proposed regularization method, a unique minimizer is selected in this set of minimizers. To be specific, we solve the following problem
[TABLE]
Note that the above model is related to models incorporating infinal convolution of and fidelity terms, which are used for mixed Gaussian and Salt & Pepper noise image restoration, as proposed in [30, 31] for instance. Although this model is different from the example we give in eq. 45, one can adjust the arguments to prove the same statements for this model. In other words, when the two parameters and converge to zero in a comparable rate, the -component converges to the element defined by
[TABLE]
and the -component converges to the residual . Numerically, we use a splitting method and the algorithm in [39, 50, 67] to solve the minimizer in eq. 65 when . We test the regularization method on the four images shown in the first row in table 2, and the corresponding -components are shown in the second row.
6. Conclusion
In this paper, we provide connections between multi-time Hamilton-Jacobi equations and some optimization problems such as the decomposition models in image processing. To be specific, we show a representation formula for the minimizers and clarify the connection between the minimizers and the spatial gradient of the minimal values. Moreover, we also study the variational behaviors of the momentum and the velocities . It turns out that their limits solve two optimization problems which are dual to each other. In addition, we provide a new perspective from convex analysis to prove the uniqueness of the convex solution to the multi-time Hamilton-Jacobi equation, taking advantage of the convexity assumptions to overcome the difficulty that the functions can take the value . At last, we demonstrate a regularization method to modify the decomposition models which have non-unique minimizers.
In this work, we consider the optimization problems which can be written in the form of Lax formula eq. 15. Hence, we assume the observed data is the summation of different components . We do not consider non-additive perturbation models such as [16, 55, 87]. However, our analysis actually covers a wide range of decomposition models with additive noise and the results can be easily extended to vector-valued images such as color images.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Acar and C. R. Vogel , Analysis of bounded variation penalty methods for ill-posed problems , Inverse Problems, 10 (1994), pp. 1217–1229.
- 2[2] W. Allard , Total variation regularization for image denoising, I. geometric theory , SIAM Journal on Mathematical Analysis, 39 (2008), pp. 1150–1190.
- 3[3] , Total variation regularization for image denoising, II. examples , SIAM Journal on Imaging Sciences, 1 (2008), pp. 400–417.
- 4[4] , Total variation regularization for image denoising, III. examples. , SIAM Journal on Imaging Sciences, 2 (2009), pp. 532–568.
- 5[5] S. Alliney , A property of the minimum vectors of a regularizing functional defined by means of the absolute norm , IEEE Transactions on Signal Processing, 45 (1997), pp. 913–917.
- 6[6] S. Alliney and S. A. Ruzinsky , An algorithm for the minimization of mixed l 1 subscript 𝑙 1 l_{1} and l 2 subscript 𝑙 2 l_{2} norms with application to bayesian estimation , IEEE Transactions on Signal Processing, 42 (1994), pp. 618–627.
- 7[7] G. Aubert and P. Kornprobst , Mathematical Problems in Image Processing , Springer-Verlag, 2002.
- 8[8] J. P. Aubin and A. Cellina , Differential Inclusions: Set-Valued Maps and Viability Theory , Springer-Verlag, Berlin, Heidelberg, 1984.
