Upper-Bounding the Regularization Constant for Convex Sparse Signal Reconstruction
Renliang Gu, Aleksandar Dogand\v{z}i\'c

TL;DR
This paper derives upper bounds on the regularization constant in convex sparse signal reconstruction, ensuring the regularization term does not dominate the data fidelity term, with practical algorithms for computing these bounds.
Contribution
It provides necessary and sufficient conditions for the irrelevance of regularization and develops an optimization framework and ADMM algorithm to compute upper bounds.
Findings
Derived bounds match empirical results in simulations.
Established conditions for the irrelevance of regularization.
Proposed an efficient ADMM-based method for bound computation.
Abstract
Consider reconstructing a signal by minimizing a weighted sum of a convex differentiable negative log-likelihood (NLL) (data-fidelity) term and a convex regularization term that imposes a convex-set constraint on and enforces its sparsity using -norm analysis regularization. We compute upper bounds on the regularization tuning constant beyond which the regularization term overwhelmingly dominates the NLL term so that the set of minimum points of the objective function does not change. Necessary and sufficient conditions for irrelevance of sparse signal regularization and a condition for the existence of finite upper bounds are established. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized by a feasible and also develop an alternating direction method of multipliers (ADMM) type method for their…
| , DWT | , DWT | , TV | , TV | |||||
|---|---|---|---|---|---|---|---|---|
| SNR/dB | theoretical | empirical | theoretical | empirical | theoretical | empirical | theoretical | empirical |
| 8.87 | 8.87 | 9.43 | 9.43 | 101.55 | 101.54 | same as , TV | ||
| 8.91 | 8.91 | 9.47 | 9.47 | 100.21 | 100.21 | |||
| 9.03 | 9.03 | 9.59 | 9.59 | 96.47 | 96.47 | |||
| 9.43 | 9.43 | 9.98 | 9.98 | 87.49 | 87.49 | |||
| 11.88 | 11.89 | 14.03 | 14.02 | 152.07 | 152.07 | |||
| 27.77 | 27.78 | 43.28 | 43.28 | 361.56 | 361.56 | |||
| 88.78 | 88.82 | 139.67 | 139.66 | 1024.04 | 1024.04 | |||
| 77.29 | 77.31 | 123.91 | 123.90 | 683.43 | 683.43 | 909.50 | 909.48 | |
| DWT | Anisotropic TV | Isotropic TV | ||||
|---|---|---|---|---|---|---|
| theoretical | empirical | theoretical | empirical | theoretical | empirical | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Photoacoustic and Ultrasonic Imaging · Medical Image Segmentation Techniques
Upper-Bounding the Regularization Constant for Convex Sparse Signal
Reconstruction
The authors are with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011 USA (e-mail: {renliang,ald}@iastate.edu). This work was supported by the National Science Foundation under Grant CCF-1421480. Renliang Gu and Aleksandar Dogandžić
Abstract
Consider reconstructing a signal by minimizing a weighted sum of a convex differentiable negative log-likelihood (NLL) (data-fidelity) term and a convex regularization term that imposes a convex-set constraint on and enforces its sparsity using -norm analysis regularization. We compute upper bounds on the regularization tuning constant beyond which the regularization term overwhelmingly dominates the NLL term so that the set of minimum points of the objective function does not change. Necessary and sufficient conditions for irrelevance of sparse signal regularization and a condition for the existence of finite upper bounds are established. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized by a feasible and also develop an alternating direction method of multipliers (ADMM) type method for their computation. Simulation examples show that the derived and empirical bounds match.
I Introduction
Selection of the regularization tuning constant in convex Tikhonov-type [1] penalized negative log-likelihood (NLL) minimization
[TABLE]
is a challenging problem critical for obtaining accurate estimates of the signal [2, Ch. 7]. Too little regularization leads to unstable reconstructions with large noise and artifacts due to, for example, aliasing. With too much regularization, the reconstructions are too smooth and often degenerate to constant signals. Finding bounds on the regularization constant or finding conditions for the irrelevance of signal regularization has received little attention. In this paper, we determine upper bounds on beyond which the regularization term overwhelmingly dominates the NLL term in (1) so that the minima of the objective function do not change. For a linear measurement model with white Gaussian noise and -norm regularization, a closed-form expression for such a bound is determined in [3, eq. (4)]; see also Example 4. The obtained bounds can be used to design continuation procedures [4, 5] that gradually decrease from a large starting point down to the desired value, which improves the numerical stability and convergence speed of the resulting minimization algorithm by taking advantage of the fact that penalized NLL schemes converge faster for smoother problems with larger [6]. In some scenarios, users can monitor the reconstructions as decreases and terminate when the result is satisfactory.
Consider a convex NLL and a regularization term
[TABLE]
that imposes a convex-set constraint on , , and sparsity of an appropriate linearly transformed , where is a known sparsifying dictionary matrix. Assume that the NLL is differentiable and lower bounded within the closed convex set , and satisfies
[TABLE]
which ensures that is computable for all . Define the convex sets of solutions to , , and :111The use of “” in the definitions of and in (4b) and (4c) makes it easier to identify both as convex sets.
[TABLE]
where the existence of is ensured by the assumption that is lower bounded in .
We review the notation: “∗”, “T”, “H”, “+”, , , , “”, “”, , , and denote complex conjugation, transpose, Hermitian transpose, Moore-Penrose matrix inverse, -norm over the complex vector space defined by for , absolute value, Kronecker product, elementwise versions of “” and “”, the identity matrix of size and the vectors of ones and zeros, respectively (replaced by , and when the dimensions can be inferred). , , and denote the indicator function, projection onto , and the elementwise exponential function: .
Denote by and the null space and range (column space) of a matrix . These vector spaces are real or complex depending on whether is a real- or complex-valued matrix. For a set of complex vectors of size , define \operatorname{Re}S\triangleq\bigl{\{}\boldsymbol{s}\in\mathamsbb{R}^{p}\mid\boldsymbol{s}+\mathrm{j}\boldsymbol{t}\in S\text{ for some \boldsymbol{t}\in\mathamsbb{R}^{p}}\bigr{\}} and S\cap\mathamsbb{R}^{p}\triangleq\bigl{\{}\boldsymbol{s}\in\mathamsbb{R}^{p}\mid{\boldsymbol{s}+\mathrm{j}\boldsymbol{0}\in S}\bigr{\}}, where . For ,
[TABLE]
are the real null space and range of and , respectively, where
[TABLE]
If in (6) has full row rank, we can define
[TABLE]
which reduces to for real-valued . The following are equivalent: , , and , where
[TABLE]
We can decompose as
[TABLE]
where and with and ; \underline{Z}=\bigl{[}\operatorname{Re}Z\;\operatorname{Im}Z\bigr{]}\in\mathamsbb{R}^{d\times 2p^{\prime}}, consistent with the notation in (6). Here, denotes the real range of the real-valued matrix . Clearly, is of interest; otherwise . Observe that (see (7))
[TABLE]
The subdifferential of the indicator function is the normal cone to at [7, Sec. 5.4] and, by the definition of a cone, satisfies
[TABLE]
Define
[TABLE]
and its elementwise extension for vector arguments , which can be interpreted as twice the Wirtinger subdifferential of with respect to [8]. Note that , and, when is a real vector, is the subdifferential of with respect to [9, Sec. 11.3.4].
Lemma 1
For and , the subdifferential of with respect to is
[TABLE]
Proof:
(13) follows from
[TABLE]
where is the th column of . We obtain (14) by replacing the linear transform matrix in [10, Prop. 2.1] with \bigl{[}\operatorname{Re}{\boldsymbol{\psi}_{j}}\,\,\operatorname{Im}{\boldsymbol{\psi}_{j}}\bigr{]}^{T}. ∎
We now use Lemma 1 to formulate the necessary and sufficient conditions for :
[TABLE]
respectively.
When the signal vector corresponds to an image , its isotropic and anisotropic total-variation (TV) regularizations correspond to [11, Sec. 2.1]
[TABLE]
respectively, where and are the vertical and horizontal difference matrices (similar to those in [12, Sec. 15.3.3]), and
[TABLE]
obtained by appending an all-zero row from below to the upper-trapezoidal matrix with first row \bigl{[}1,-1,0,\dotsc,0\bigr{]}; note that . Here, and
[TABLE]
for both the isotropic and anisotropic TV regularizations.
The scenario where
[TABLE]
holds is of practical interest: then and globally minimize the regularization term: . If (19) holds and , then , where
[TABLE]
If, in addition to (19),
- •
, then ;
- •
, then and are constant signals of the form .
In Section II, we define and explain an upper bound on useful regularization constants and establish conditions under which signal sparsity regularization is irrelevant and finite does not exist. We then present an optimization problem for finding when (19) holds (Section III), develop a general numerical method for computing bounds (Section IV), present numerical examples (Section V), and make concluding remarks (Section VI).
II Upper Bound Definition and Properties
Define
[TABLE]
If for all , then finite does not exist, which we denote by .
We now show that, if , then the the set of minimum points of the objective function does not change.
Remark 1
- (a)
For any , if and only if . 2. (b)
Assuming for some , for .
Proof:
We first prove (a). Necessity follows by the existence of ; see (4c). We argue sufficiency by contradiction. Consider any ; i.e., minimizes both and . If , there exists a with that, by the definition of , also minimizes . Therefore, , which contradicts the assumption . Therefore, . If there exists a such that , then which, since both and are in , implies that and contradicts the definition of . Therefore, .
We now prove (b). By (a), , which confirms (b) for . Consider now , a , and any . Then,
[TABLE]
By summing the two inequalities in (22) and rearranging, we obtain . Since , is also in ; i.e., , which implies by (a). ∎
As increases, moves gradually towards and, according to the definition (21), and do not intersect when . Once , the intersection of the two sets is , and, by Remark 1(b), for all .
II-A Irrelevant Signal Sparsity Regularization
Remark 2
The following claims are equivalent:
- (a)
; i.e., there exists an such that
[TABLE] 2. (b)
; and 3. (c)
; i.e., .
Proof:
(c) follows from (a) because . (b) follows from (c) by applying Remark 1(a) to obtain , which implies (b). Finally, (b) implies (a). ∎
Having for at least one implies (23) and is therefore a stronger condition than (23).
Example 1
Consider and C=\bigl{\{}{\boldsymbol{x}}\in\mathamsbb{R}^{2}\mid\lVert{\boldsymbol{x}}-\boldsymbol{1}_{2\times 1}\rVert_{2}\leq 1\bigr{\}}. (Here, could correspond to the Gaussian measurement model with measurements equal to zero.) Since is a circle within , the objective functions for the identity () and 1D TV sparsifying transforms are
[TABLE]
*respectively, where and {\boldsymbol{x}}^{\diamond}=\bigl{(}1-{\sqrt{2}}/{2}\bigr{)}\boldsymbol{1}. Here, and , which confirms that (23) holds. *
II-B Condition for Infinite and Guarantees for Finite
Remark 3
If there exists such that
[TABLE]
then . When (19) holds, the reverse is also true with a stronger claim: implies (25) for all .
Proof:
First, we prove sufficiency by contradiction. If a finite exists, then for all . Therefore, (15a) holds with being any , which contradicts (25).
In the case where (19) holds, we prove the necessity by contradiction. If (25) does not hold for all , there exist and such that
[TABLE]
Since (19) holds, and ; see (20). When , and \operatorname{Re}(\Psi\boldsymbol{w})\in u\operatorname{Re}\bigl{(}\Psi G(\Psi^{H}{\boldsymbol{x}}^{\diamond})\bigr{)}. Therefore, (15a) holds at for all , which contradicts . ∎
Example 2
Consider , , and C=\bigl{\{}{\boldsymbol{x}}\in\mathamsbb{R}^{2}\mid\lVert{\boldsymbol{x}}-\boldsymbol{1}_{2\times 1}\rVert_{2}\leq 1\bigr{\}}. (Here, could correspond to the measurement model with measurement equal to zero.) Since is a circle within , the objective function is
[TABLE]
with , , and
[TABLE]
which implies , consistent with the observation that . Here, (19) is not satisfied: (25) is only a sufficient condition for and does not hold in this example.
Example 3
Consider , 1D TV sparsifying transform with , and C=\bigl{\{}{\boldsymbol{x}}\in\mathamsbb{R}^{2}\mid\bigl{\|}{\boldsymbol{x}}-\bigl{[}2,\;0\bigr{]}^{T}\bigr{\|}_{2}^{2}\leq 2\bigr{\}}. Since is a circle with , the objective function is
[TABLE]
*with \mathcal{X}_{u}=\bigl{\{}\bigl{[}2-(1+{4}/{u})/q(u),\;1/q(u)\bigr{]}^{T}\bigr{\}}, , and , which implies . Since (19) holds in this example, (25) is necessary and sufficient for . Since and , (25) holds. *
II-B1 Two cases of finite
If and (19) holds, then must be finite: in this case, condition (25) in Remark 3 cannot hold, which is easy to confirm by substituting into (25).
must also be finite if
[TABLE]
Indeed, (30) implies (19) and that for ,
[TABLE]
and hence (25) cannot hold upon substituting (31a) and (31b). Here, (31b) follows from , the condition for optimality of the optimization problem that defines , by using the fact that when .
If (30) holds then, by Remark 2, if and only if .
III Bounds When (19) Holds
We now present an optimization problem for finding when (19) holds.
Theorem 1
Assume that (19) holds and that the convex NLL is differentiable within . Consider the following optimization problem:
[TABLE]
with
[TABLE]
Then, for all and in (21).
Here, if and only if the constraints in (32c) and (32c) cannot be satisfied for any , which is equivalent to satisfying (25) in Remark 3.
Proof:
Observe that for all and
[TABLE]
due to (19) and (10a), respectively.
We first prove that if . Consider any and denote by a pair that solves the minimization problem (P0). Since , there exists an such that . Using (34), we obtain
[TABLE]
which implies according to (15a).
Second, we prove that if for any , then . We employ proof by contradiction. Suppose ; then, there exists an . According to (15a), there exist an and an such that . Using (34), we have
[TABLE]
Note that
[TABLE]
Inserting (LABEL:eq:id) into (36) and using (10a) and the fact that has full column rank leads to ; thus
[TABLE]
Now, rearrange and use the fact that (see (20)) to obtain
[TABLE]
which contradicts (32), where is the minimum.
Finally, we prove by contradiction that is invariant within if has more than one element. Assume that there exist and such that . We obtain contradictory results: and because and , respectively. Therefore, is invarant to .
The constraints on in (32c) and (32c) are equivalent to stating that (25) does not hold for any ; see also (10b). If an does not exist that satisfies these constraints, (25) holds and according to Remark 3. ∎
We make a few observations: (P0) is a linear programming problem with linear constraints and can be solved using CVX [13] and Matlab’s optimization toolbox upon identifying and in (32c) and (32c), respectively. Theorem 1 requires differentiability of the NLL only at . If is real, then is real as well, the optimal in (P0) has zero imaginary component and the corresponding simplified version of Theorem 1 follows and requires optimization in (P0) with respect to real-valued .
If is real and , then we can select , which leads to and cancellation of the variable in (32c) and simplification of (P0).
We now specialize Theorem 1 to two cases with finite .
Corollary 1** ()**
If and if (19) holds, then in (21) can be computed as
[TABLE]
Proof:
Theorem 1 applies, , and must be finite. Setting in (32) leads to (40). ∎
If , then and the condition reduces to .
Corollary 2** ()**
If (30) holds, then in (21) can be computed as
[TABLE]
with any .
Proof:
Thanks to (30), (19) and (31a)–(31b) are satisfied, Theorem 1 applies, must be finite, and (by (31a)). By using these facts, we simplify (32) to obtain (41). ∎
If and , then both Corollaries 1 and 2 apply and the upper bound can be obtained by setting and in (40) or by setting and in (41).
Example 4
Consider a real invertible .
- (a)
If , Corollary 1 applies and (40) becomes
[TABLE] 2. (b)
If , Corollaries 1 and 2 apply and the bound simplifies to
[TABLE]
For and a linear measurement model with white Gaussian noise, (42b) reduces to the expressions in **[3, eq. (4)]** and **[5, Sec. III]**, used in **[5]** to design its continuation scheme; **[3]** and **[5]** also assume .
Example 5** (One-dimensional TV regularization)**
Consider 1D TV regularization with obtained by setting in (16a); note that . Consider a constant signal . Then Theorem 1 applies and yields
[TABLE]
The bounds obtained by solving (P0) are often simple but restricted to the scenario where (19) holds. In the following section, we remove assumption (19) and develop a general numerical method for finding in (21).
IV ADMM Algorithm for Computing
We focus on the nontrivial scenario where (23) does not hold and assume . We also assume that an is available, which will be sufficient to obtain the in (21). We use the duality of norms [14, App. A.1.6]:
[TABLE]
to rewrite the minimization of (1) as the following min-max problem (see also (20)):
[TABLE]
Since the objective function in (45) is convex with respect to and concave with respect to , the optimal is at the saddle point of (45) and satisfies
[TABLE]
Now, select as the smallest for which (46a)–(46b) hold with :
[TABLE]
where is the solution to the following constrained linear programming problem:
[TABLE]
obtained from (46a)–(46b) with and replaced by and . Here,
[TABLE]
is the normalized gradient (for numerical stability) of the NLL at ; because (23) does not hold. Due to (15b), is a feasible point that satisfies the constraints (49a), which implies that . When (25) holds, has to be zero, implying .
To solve (P1) and find , we apply an iterative algorithm based on alternating direction method of multipliers (ADMM) [15, 16]
[TABLE]
where is a tuning parameter for the ADMM iteration and we solve (51a) using the Broyden-Fletcher-Goldfarb-Shanno optimization algorithm with box constraints [17] and projected Nesterov’s proximal-gradient (PNPG) algorithm [18] for real and complex , respectively. We initialize the iteration (51) with , , , and , where is adaptively adjusted thereafter using the scheme in [15, Sec. 3.4.1].
In special cases, (51) simplifies. If (19) holds, then and the constraint in (51a) simplifies to ; see (20). If , and or , (51a) has the following analytical solution:
[TABLE]
When (30) holds, (51c) reduces to for all , thanks to (31a).
When is real, the constraints imposed by become linear and (P1) becomes a linear programming problem with linear constraints.
V Numerical Examples
Matlab implementations of the presented examples are available at https://github.com/isucsp/imgRecSrc/uBoundEx. In all numerical examples, the empirical upper bounds were obtained by a grid search over with obtained using the PNPG method [18].
V-A Signal reconstruction for Gaussian linear model
We adopt the linear measurement model with white Gaussian noise and scaled NLL , where the elements of the sensing matrix are independent, identically distributed (i.i.d.) and drawn from the uniform distribution on a unit sphere. We reconstruct the nonnegative “skyline” signal in [18, Sec. LABEL:report-sec:linear1dex] from noisy linear measurements using the discrete wavelet transform (DWT) and 1D TV regularizations, where the DWT matrix is orthogonal (), constructed using the Daubechies-4 wavelet with three decomposition levels. Define the signal-to-noise ratio (SNR) as
[TABLE]
where is the variance of the Gaussian noise added to to create the noisy measurement vector .
For and with DWT regularization, and Example 4 applies and yields the upper bounds (42a) and (42b), respectively.
For TV regularization, we apply the result in Example 5. For and , we have and , respectively, where
[TABLE]
If , which holds when or when and , then the bound is given by (43b). For and if , then and (43a) applies. In this case, if for , which occurs only when for all .
Table I shows the theoretical and empirical bounds for DWT and TV regularizations and and ; we decrease the SNR from with independent noise realizations for different SNRs. The theoretical bounds in Sections III and IV coincide. For DWT regularization, is the same for both convex sets and thus the upper bound for is always smaller than its counterpart for , thanks to being optimized over variable in (42a). For TV regularization, when , the upper bounds coincide for both because, in this case, is the same for both and . In the last row of Table I we show the case where ; then, differs for the two convex sets , and the upper bound for is smaller than its counterpart for , thanks to being optimized over variable in (43a): compare (43a) with (43b).
V-B PET image reconstruction from Poisson measurements
Consider positron emission tomography (PET) reconstruction of the concentration map in [18, Fig. LABEL:report-fig:pet], which represents simulated radiotracer activity in a human chest, from independent noisy Poisson-distributed measurements with means . The choices of parameters in the PET system setup and concentration map have been taken from the Image Reconstruction Toolbox (IRT) [19, emission/em_test_setup.m]. Here,
[TABLE]
is the known sensing matrix; is the density map needed to model the attenuation of the gamma rays [20]; is the known intercept term accounting for background radiation, scattering effect, and accidental coincidence;222The elements of the intercept term have been set to a constant equal to of the sample mean of : . is a known vector that models the detector efficiency variation; and is a known scaling constant, which we use to control the expected total number of detected photons due to electron-positron annihilation, , an SNR measure. We collect the photons from equally spaced directions over , with radial samples at each direction. Here, we adopt the parallel strip-integral matrix [21, Ch. 25.2] and use its implementation in the IRT [19].
We now consider the nonnegative convex set , which ensures that (3) holds, and 2D isotropic and anisotropic TV and DWT regularizations, where the 2D DWT matrix is constructed using the Daubechies-6 wavelet with six decomposition levels.
For TV regularizations, , where , computed using the bisection method that finds the zero of , which is an increasing function of . Here, no search for is needed when , because in this case .
We computed the theoretical bounds using the ADMM-type algorithm in Section IV.
Table II shows the theoretical and empirical bounds for DWT and TV regularizations and the SNR varying from to , with independent measurement realizations for different SNRs.
Denote the isotropic and anisotropic 2D TV bounds by and , respectively. Then, it is easy to show that when (19) holds, , which follows by using the inequalities and is confirmed in Table II.
VI Concluding Remarks
Future work will include obtaining simple expressions for upper bounds for isotropic 2D TV regularization, based on Theorem 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andrey N. Tikhonov and Vasiliy Y. Arsenin “Solutions of Ill-Posed Problems” Washington, DC: Winston, 1977
- 2[2] Curtis R. Vogel “Computational Methods for Inverse Problems” Philadelphia, PA: SIAM, 2002
- 3[3] S. J. Kim, K. Koh, M. Lustig, S. Boyd and D. Gorinevsky “An Interior-Point Method for Large-Scale ℓ 1 subscript ℓ 1 \ell_{1} -Regularized Least Squares” In IEEE J. Sel. Topics Signal Process. 1.4 , 2007, pp. 606–617
- 4[4] E. Hale, W. Yin and Y. Zhang “Fixed-Point Continuation for ℓ 1 subscript ℓ 1 \ell_{1} -Minimization: Methodology and Convergence” In SIAM J. Optim. 19.3 , 2008, pp. 1107–1130
- 5[5] Stephen J Wright, Robert D Nowak and Mário A T Figueiredo “Sparse Reconstruction by Separable approximation” In IEEE Trans. Signal Process. 57.7 IEEE, 2009, pp. 2479–2493
- 6[6] E. Allgower and K. Georg “Introduction to Numerical Continuation Methods” Philadelphia, PA: SIAM, 2003
- 7[7] Dimitri P. Bertsekas “Convex Optimization Theory” Belmont, MA: Athena Scientific, 2009
- 8[8] P. Bouboulis, K. Slavakis and S. Theodoridis “Adaptive Learning in Complex Reproducing Kernel Hilbert Spaces Employing Wirtinger’s Subgradients” In IEEE Trans. Neural Netw. Learn. Syst. 23.3 , 2012, pp. 425–438
