Upper-Bounding the Regularization Constant for Convex Sparse Signal   Reconstruction

Renliang Gu; Aleksandar Dogand\v{z}i\'c

arXiv:1702.07930·stat.CO·February 28, 2017

Upper-Bounding the Regularization Constant for Convex Sparse Signal Reconstruction

Renliang Gu, Aleksandar Dogand\v{z}i\'c

PDF

Open Access

TL;DR

This paper derives upper bounds on the regularization constant in convex sparse signal reconstruction, ensuring the regularization term does not dominate the data fidelity term, with practical algorithms for computing these bounds.

Contribution

It provides necessary and sufficient conditions for the irrelevance of regularization and develops an optimization framework and ADMM algorithm to compute upper bounds.

Findings

01

Derived bounds match empirical results in simulations.

02

Established conditions for the irrelevance of regularization.

03

Proposed an efficient ADMM-based method for bound computation.

Abstract

Consider reconstructing a signal $x$ by minimizing a weighted sum of a convex differentiable negative log-likelihood (NLL) (data-fidelity) term and a convex regularization term that imposes a convex-set constraint on $x$ and enforces its sparsity using $ℓ_{1}$ -norm analysis regularization. We compute upper bounds on the regularization tuning constant beyond which the regularization term overwhelmingly dominates the NLL term so that the set of minimum points of the objective function does not change. Necessary and sufficient conditions for irrelevance of sparse signal regularization and a condition for the existence of finite upper bounds are established. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized by a feasible $x$ and also develop an alternating direction method of multipliers (ADMM) type method for their…

Tables2

Table 1. TABLE I : Theoretical and empirical bounds U 𝑈 U for the linear Gaussian model.

SNR/dB	theoretical	empirical	theoretical	empirical	theoretical	empirical	theoretical	empirical
	$C = ℝ_{⊉}^{∣}$ , DWT		$C = ℝ^{∣}$ , DWT		$C = ℝ_{⊉}^{∣}$ , TV		$C = ℝ^{∣}$ , TV
$30$	8.87	8.87	9.43	9.43	101.55	101.54	same as $C = ℝ_{⊉}^{∣}$ , TV
$20$	8.91	8.91	9.47	9.47	100.21	100.21
$10$	9.03	9.03	9.59	9.59	96.47	96.47
$0$	9.43	9.43	9.98	9.98	87.49	87.49
$- 10$	11.88	11.89	14.03	14.02	152.07	152.07
$- 20$	27.77	27.78	43.28	43.28	361.56	361.56
$- 30$	88.78	88.82	139.67	139.66	1024.04	1024.04
$- 30$	77.29	77.31	123.91	123.90	683.43	683.43	909.50	909.48

Table 2. TABLE II : Theoretical and empirical bounds U 𝑈 U for the PET example.

	DWT		Anisotropic TV		Isotropic TV
$𝟏^{T} Φ 𝒙_{true}$	theoretical	empirical	theoretical	empirical	theoretical	empirical
$ 10^{1}$	$9.660 × 10^{- 1}$	$9.662 × 10^{- 1}$	$7.550 × 10^{- 2}$	$7.544 × 10^{- 2}$	$7.971 × 10^{- 2}$	$7.937 × 10^{- 2}$
$ 10^{3}$	$1.155 × 10^{2}$	$1.156 × 10^{2}$	$4.154 × 10^{0}$	$4.153 × 10^{0}$	$4.888 × 10^{0}$	$4.877 × 10^{0}$
$ 10^{5}$	$1.153 × 10^{4}$	$1.153 × 10^{4}$	$3.951 × 10^{2}$	$3.950 × 10^{2}$	$4.666 × 10^{2}$	$4.656 × 10^{2}$
$ 10^{7}$	$1.145 × 10^{6}$	$1.145 × 10^{6}$	$3.947 × 10^{4}$	$3.946 × 10^{4}$	$4.661 × 10^{4}$	$4.651 × 10^{4}$
$ 10^{9}$	$1.153 × 10^{8}$	$1.154 × 10^{8}$	$3.950 × 10^{6}$	$3.949 × 10^{6}$	$4.665 × 10^{6}$	$4.654 × 10^{6}$

Equations127

f_{u} (x) = L (x) + u r (x)

f_{u} (x) = L (x) + u r (x)

r (x) = I_{C} (x) + ∥ Ψ^{H} x ∥_{1}

r (x) = I_{C} (x) + ∥ Ψ^{H} x ∥_{1}

dom L (x) \supseteq C

dom L (x) \supseteq C

X_{u}

X_{u}

Q

X^{⋄}

N (A^{H}) \cap \mathamsbb R^{M} = N (\underline{A}^{T}),

N (A^{H}) \cap \mathamsbb R^{M} = N (\underline{A}^{T}),

\displaystyle\underline{A}\triangleq\bigl{[}\operatorname{Re}A\;\operatorname{Im}A\bigr{]}\in\mathamsbb{R}^{M\times 2N}.

\displaystyle\underline{A}\triangleq\bigl{[}\operatorname{Re}A\;\operatorname{Im}A\bigr{]}\in\mathamsbb{R}^{M\times 2N}.

A^{‡} ≜ A^{H} [Re (A A^{H})]^{- 1}

A^{‡} ≜ A^{H} [Re (A A^{H})]^{- 1}

d ≜ dim (Re (R (Ψ))) \leq min (p, 2 p^{'}) .

d ≜ dim (Re (R (Ψ))) \leq min (p, 2 p^{'}) .

Ψ = F Z

Ψ = F Z

Re (Ψ Z^{‡})

Re (Ψ Z^{‡})

R (F)

N_{C} (x) = a N_{C} (x),

N_{C} (x) = a N_{C} (x),

G (s) ≜ {{s / ∣ s ∣}, {w \in \mathamsbb C ∣ ∣ w ∣ \leq 1}, s \neq = 0 s = 0

G (s) ≜ {{s / ∣ s ∣}, {w \in \mathamsbb C ∣ ∣ w ∣ \leq 1}, s \neq = 0 s = 0

\displaystyle\partial_{\boldsymbol{x}}\lVert\Psi^{H}{\boldsymbol{x}}\rVert_{1}=\operatorname{Re}\bigl{(}\Psi G(\Psi^{H}{\boldsymbol{x}})\bigr{)}.

\displaystyle\partial_{\boldsymbol{x}}\lVert\Psi^{H}{\boldsymbol{x}}\rVert_{1}=\operatorname{Re}\bigl{(}\Psi G(\Psi^{H}{\boldsymbol{x}})\bigr{)}.

\displaystyle\partial_{\boldsymbol{x}}|\boldsymbol{\psi}_{j}^{H}{\boldsymbol{x}}|=\operatorname{Re}\bigl{(}\boldsymbol{\psi}_{j}G(\boldsymbol{\psi}_{j}^{H}{\boldsymbol{x}})\bigr{)}

\displaystyle\partial_{\boldsymbol{x}}|\boldsymbol{\psi}_{j}^{H}{\boldsymbol{x}}|=\operatorname{Re}\bigl{(}\boldsymbol{\psi}_{j}G(\boldsymbol{\psi}_{j}^{H}{\boldsymbol{x}})\bigr{)}

0

0

0

Ψ

Ψ

Ψ

D (L) ≜ [r] 1 0 - 1 10 - 1 ⋱ \dots ⋱ 10 - 1 0 \in \mathamsbb R^{L \times L}

D (L) ≜ [r] 1 0 - 1 10 - 1 ⋱ \dots ⋱ 10 - 1 0 \in \mathamsbb R^{L \times L}

N (Ψ^{H}) = R (1)

N (Ψ^{H}) = R (1)

N (Ψ^{H}) \cap C \neq = \emptyset

N (Ψ^{H}) \cap C \neq = \emptyset

\displaystyle H\triangleq\bigl{\{}\boldsymbol{w}\in\mathamsbb{C}^{p^{\prime}\times 1}\mid\lVert\boldsymbol{w}\rVert_{\infty}\leq 1\bigr{\}}.

\displaystyle H\triangleq\bigl{\{}\boldsymbol{w}\in\mathamsbb{C}^{p^{\prime}\times 1}\mid\lVert\boldsymbol{w}\rVert_{\infty}\leq 1\bigr{\}}.

\displaystyle U\triangleq\inf\bigl{\{}{u\geq 0}\,|{\,\mathcal{X}_{u}\cap Q\neq\emptyset}\bigr{\}}.

\displaystyle U\triangleq\inf\bigl{\{}{u\geq 0}\,|{\,\mathcal{X}_{u}\cap Q\neq\emptyset}\bigr{\}}.

L (x) + U r (x)

L (x) + U r (x)

L (y) + u r (y)

0 \in \nabla L (x^{⋄}) + N_{C} (x^{⋄});

0 \in \nabla L (x^{⋄}) + N_{C} (x^{⋄});

f_{u} (x)

f_{u} (x)

f_{u} (x)

[\nabla L (x^{⋄}) + N_{C} (x^{⋄})] \cap Re (R (Ψ)) = \emptyset.

[\nabla L (x^{⋄}) + N_{C} (x^{⋄})] \cap Re (R (Ψ)) = \emptyset.

0 = \nabla L (x^{⋄}) + Re (Ψ w) + t .

0 = \nabla L (x^{⋄}) + Re (Ψ w) + t .

f_{u} (x) = (1 + u) x_{1} + u x_{2} + I_{C} (x)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Photoacoustic and Ultrasonic Imaging · Medical Image Segmentation Techniques

Full text

Upper-Bounding the Regularization Constant for Convex Sparse Signal

Reconstruction

The authors are with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011 USA (e-mail: {renliang,ald}@iastate.edu). This work was supported by the National Science Foundation under Grant CCF-1421480. Renliang Gu and Aleksandar Dogandžić

Abstract

Consider reconstructing a signal ${\boldsymbol{x}}$ by minimizing a weighted sum of a convex differentiable negative log-likelihood (NLL) (data-fidelity) term and a convex regularization term that imposes a convex-set constraint on ${\boldsymbol{x}}$ and enforces its sparsity using $\boldsymbol{\ell}_{1}$ -norm analysis regularization. We compute upper bounds on the regularization tuning constant beyond which the regularization term overwhelmingly dominates the NLL term so that the set of minimum points of the objective function does not change. Necessary and sufficient conditions for irrelevance of sparse signal regularization and a condition for the existence of finite upper bounds are established. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized by a feasible ${\boldsymbol{x}}$ and also develop an alternating direction method of multipliers (ADMM) type method for their computation. Simulation examples show that the derived and empirical bounds match.

I Introduction

Selection of the regularization tuning constant $u>0$ in convex Tikhonov-type [1] penalized negative log-likelihood (NLL) minimization

[TABLE]

is a challenging problem critical for obtaining accurate estimates of the signal ${\boldsymbol{x}}$ [2, Ch. 7]. Too little regularization leads to unstable reconstructions with large noise and artifacts due to, for example, aliasing. With too much regularization, the reconstructions are too smooth and often degenerate to constant signals. Finding bounds on the regularization constant $u$ or finding conditions for the irrelevance of signal regularization has received little attention. In this paper, we determine upper bounds on $u$ beyond which the regularization term $r({\boldsymbol{x}})$ overwhelmingly dominates the NLL term $\mathcal{L}({\boldsymbol{x}})$ in (1) so that the minima of the objective function $f_{u}({\boldsymbol{x}})$ do not change. For a linear measurement model with white Gaussian noise and $\ell_{1}$ -norm regularization, a closed-form expression for such a bound is determined in [3, eq. (4)]; see also Example 4. The obtained bounds can be used to design continuation procedures [4, 5] that gradually decrease $u$ from a large starting point down to the desired value, which improves the numerical stability and convergence speed of the resulting minimization algorithm by taking advantage of the fact that penalized NLL schemes converge faster for smoother problems with larger $u$ [6]. In some scenarios, users can monitor the reconstructions as $u$ decreases and terminate when the result is satisfactory.

Consider a convex NLL $\mathcal{L}({\boldsymbol{x}})$ and a regularization term

[TABLE]

that imposes a convex-set constraint on ${\boldsymbol{x}}$ , ${\boldsymbol{x}}\in C\subseteq\mathamsbb{R}^{p}$ , and sparsity of an appropriate linearly transformed ${\boldsymbol{x}}$ , where $\Psi\in\mathamsbb{C}^{p\times p^{\prime}}$ is a known sparsifying dictionary matrix. Assume that the NLL $\mathcal{L}({\boldsymbol{x}})$ is differentiable and lower bounded within the closed convex set $C$ , and satisfies

[TABLE]

which ensures that $\mathcal{L}({\boldsymbol{x}})$ is computable for all ${\boldsymbol{x}}\in C$ . Define the convex sets of solutions to $\min_{\boldsymbol{x}}f_{u}({\boldsymbol{x}})$ , $\min_{\boldsymbol{x}}r({\boldsymbol{x}})$ , and $\min_{{\boldsymbol{x}}\in Q}\mathcal{L}({\boldsymbol{x}})$ :111The use of “ $\leq$ ” in the definitions of $Q$ and $\mathcal{X}^{\diamond}$ in (4b) and (4c) makes it easier to identify both as convex sets.

[TABLE]

where the existence of $\mathcal{X}^{\diamond}$ is ensured by the assumption that $\mathcal{L}({\boldsymbol{x}})$ is lower bounded in $C$ .

We review the notation: “∗”, “T”, “H”, “+”, $\lVert\cdot\rVert_{p}$ , $\lvert\cdot\rvert$ , $\otimes$ , “ $\succeq$ ”, “ $\preceq$ ”, $I_{N}$ , $\boldsymbol{1}_{N\times 1}$ , and $\boldsymbol{0}_{N\times 1}$ denote complex conjugation, transpose, Hermitian transpose, Moore-Penrose matrix inverse, $\ell_{p}$ -norm over the complex vector space $\mathamsbb{C}^{N}$ defined by $\|\boldsymbol{z}\|_{p}^{p}=\sum_{i=1}^{N}|z_{i}|^{p}$ for $\boldsymbol{z}=(z_{i})\in\mathamsbb{C}^{N}$ , absolute value, Kronecker product, elementwise versions of “ $\geq$ ” and “ $\leq$ ”, the identity matrix of size $N$ and the $N\times 1$ vectors of ones and zeros, respectively (replaced by $I,\boldsymbol{1}$ , and $\boldsymbol{0}$ when the dimensions can be inferred). $\mathbb{I}_{C}({\boldsymbol{a}})=\begin{cases}0,&{\boldsymbol{a}}\in C\\ +\infty,&\text{otherwise}\end{cases}$ , $P_{C}({\boldsymbol{a}})=\arg\min_{{\boldsymbol{x}}\in C}\lVert{\boldsymbol{x}}-{\boldsymbol{a}}\rVert_{2}^{2}$ , and $\exp_{\circ}{\boldsymbol{a}}$ denote the indicator function, projection onto $C$ , and the elementwise exponential function: $[\exp_{\circ}{\boldsymbol{a}}]_{i}=\exp a_{i}$ .

Denote by $\mathcal{N}(A)$ and $\mathcal{R}(A)$ the null space and range (column space) of a matrix $A$ . These vector spaces are real or complex depending on whether $A$ is a real- or complex-valued matrix. For a set $S$ of complex vectors of size $p$ , define $\operatorname{Re}S\triangleq\bigl{\{}\boldsymbol{s}\in\mathamsbb{R}^{p}\mid\boldsymbol{s}+\mathrm{j}\boldsymbol{t}\in S\text{ for some$ \boldsymbol{t}\in\mathamsbb{R}^{p} $}\bigr{\}}$ and $S\cap\mathamsbb{R}^{p}\triangleq\bigl{\{}\boldsymbol{s}\in\mathamsbb{R}^{p}\mid{\boldsymbol{s}+\mathrm{j}\boldsymbol{0}\in S}\bigr{\}}$ , where $\mathrm{j}=\sqrt{-1}$ . For $A\in\mathamsbb{C}^{M\times N}$ ,

[TABLE]

are the real null space and range of $\underline{A}^{T}$ and $\underline{A}$ , respectively, where

[TABLE]

If $\underline{A}$ in (6) has full row rank, we can define

[TABLE]

which reduces to $A^{+}$ for real-valued $A$ . The following are equivalent: $\operatorname{Re}(\mathcal{R}(\Psi))=\mathamsbb{R}^{p}$ , $\mathcal{N}(\Psi^{H})\cap\mathamsbb{R}^{p}=\{\boldsymbol{0}\}$ , and $d=p$ , where

[TABLE]

We can decompose $\Psi$ as

[TABLE]

where $F\in\mathamsbb{R}^{p\times d}$ and $Z\in\mathamsbb{C}^{d\times p^{\prime}}$ with $\operatorname*{rank}F=d$ and $\operatorname*{rank}\underline{Z}=d$ ; $\underline{Z}=\bigl{[}\operatorname{Re}Z\;\operatorname{Im}Z\bigr{]}\in\mathamsbb{R}^{d\times 2p^{\prime}}$ , consistent with the notation in (6). Here, $\mathcal{R}(F)$ denotes the real range of the real-valued matrix $F$ . Clearly, $d\geq 1$ is of interest; otherwise $\Psi=0$ . Observe that (see (7))

[TABLE]

The subdifferential of the indicator function $N_{C}({\boldsymbol{x}})=\partial\mathbb{I}_{C}({\boldsymbol{x}})$ is the normal cone to $C$ at ${\boldsymbol{x}}$ [7, Sec. 5.4] and, by the definition of a cone, satisfies

[TABLE]

Define

[TABLE]

and its elementwise extension $G(\boldsymbol{s})$ for vector arguments $\boldsymbol{s}$ , which can be interpreted as twice the Wirtinger subdifferential of $\|\boldsymbol{s}\|_{1}$ with respect to $\boldsymbol{s}$ [8]. Note that $\boldsymbol{s}^{H}G(\boldsymbol{s})=\{\lVert\boldsymbol{s}\rVert_{1}\}$ , and, when $\boldsymbol{s}$ is a real vector, $\operatorname{Re}(G(\boldsymbol{s}))$ is the subdifferential of $\lVert\boldsymbol{s}\rVert_{1}$ with respect to $\boldsymbol{s}$ [9, Sec. 11.3.4].

Lemma 1

For $\Psi\in\mathamsbb{C}^{p\times p^{\prime}}$ and ${\boldsymbol{x}}\in\mathamsbb{R}^{p}$ , the subdifferential of $\lVert\Psi^{H}{\boldsymbol{x}}\rVert_{1}$ with respect to ${\boldsymbol{x}}$ is

[TABLE]

Proof:

(13) follows from

[TABLE]

where $\boldsymbol{\psi}_{j}$ is the $j$ th column of $\Psi$ . We obtain (14) by replacing the linear transform matrix in [10, Prop. 2.1] with $\bigl{[}\operatorname{Re}{\boldsymbol{\psi}_{j}}\,\,\operatorname{Im}{\boldsymbol{\psi}_{j}}\bigr{]}^{T}$ . ∎

We now use Lemma 1 to formulate the necessary and sufficient conditions for ${\boldsymbol{x}}\in\mathcal{X}_{u}$ :

[TABLE]

respectively.

When the signal vector ${\boldsymbol{x}}=\operatorname{vec}X$ corresponds to an image $X\in\mathamsbb{R}^{J\times K}$ , its isotropic and anisotropic total-variation (TV) regularizations correspond to [11, Sec. 2.1]

[TABLE]

respectively, where $\Psi_{\text{v}}=I_{K}\otimes D^{T}(J)$ and $\Psi_{\text{h}}=D^{T}(K)\otimes I_{J}$ are the vertical and horizontal difference matrices (similar to those in [12, Sec. 15.3.3]), and

[TABLE]

obtained by appending an all-zero row from below to the $(L-1)\times L$ upper-trapezoidal matrix with first row $\bigl{[}1,-1,0,\dotsc,0\bigr{]}$ ; note that $D(1)=0$ . Here, $d=JK-1$ and

[TABLE]

for both the isotropic and anisotropic TV regularizations.

The scenario where

[TABLE]

holds is of practical interest: then $Q=\mathcal{N}(\Psi^{H})\cap C$ and ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ globally minimize the regularization term: $r({\boldsymbol{x}}^{\diamond})=0$ . If (19) holds and ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ , then $G(\Psi^{H}{\boldsymbol{x}}^{\diamond})=H$ , where

[TABLE]

If, in addition to (19),

•

$d=p$ , then $\mathcal{X}^{\diamond}=Q=\{\boldsymbol{0}\}$ ;

•

$\mathcal{N}(\Psi^{H})\cap\mathamsbb{R}^{p}=\mathcal{R}(\boldsymbol{1})$ , then $Q=\mathcal{R}(\boldsymbol{1})\cap C$ and ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ are constant signals of the form ${\boldsymbol{x}}^{\diamond}=\boldsymbol{1}x_{0}^{\diamond},\,x_{0}^{\diamond}\in\mathamsbb{R}$ .

In Section II, we define and explain an upper bound $U$ on useful regularization constants $u$ and establish conditions under which signal sparsity regularization is irrelevant and finite $U$ does not exist. We then present an optimization problem for finding $U$ when (19) holds (Section III), develop a general numerical method for computing bounds $U$ (Section IV), present numerical examples (Section V), and make concluding remarks (Section VI).

II Upper Bound Definition and Properties

Define

[TABLE]

If $\mathcal{X}_{u}\cap Q=\emptyset$ for all $u$ , then finite $U$ does not exist, which we denote by $U=+\infty$ .

We now show that, if $u\geq U$ , then the the set of minimum points $\mathcal{X}_{u}$ of the objective function does not change.

Remark 1

(a)

For any $u$ , $\mathcal{X}_{u}\cap Q=\mathcal{X}^{\diamond}$ if and only if $\mathcal{X}_{u}\cap Q\neq\emptyset$ . 2. (b)

Assuming $\mathcal{X}_{U}\cap Q\neq\emptyset$ for some $U\geq 0$ , $\mathcal{X}_{u}=\mathcal{X}^{\diamond}$ for $u>U$ .

Proof:

We first prove (a). Necessity follows by the existence of $\mathcal{X}^{\diamond}$ ; see (4c). We argue sufficiency by contradiction. Consider any ${\boldsymbol{x}}_{u}\in\mathcal{X}_{u}\cap Q$ ; i.e., ${\boldsymbol{x}}_{u}$ minimizes both $f_{u}({\boldsymbol{x}})$ and $r({\boldsymbol{x}})$ . If ${\boldsymbol{x}}_{u}\notin\mathcal{X}^{\diamond}$ , there exists a $\boldsymbol{y}\in\mathcal{X}^{\diamond}$ with $\mathcal{L}(\boldsymbol{y})<\mathcal{L}({\boldsymbol{x}}_{u})$ that, by the definition of $\mathcal{X}^{\diamond}$ , also minimizes $r({\boldsymbol{x}})$ . Therefore, $f_{u}(\boldsymbol{y})=\mathcal{L}(\boldsymbol{y})+ur(\boldsymbol{y})<f_{u}({\boldsymbol{x}}_{u})$ , which contradicts the assumption ${\boldsymbol{x}}_{u}\in\mathcal{X}_{u}$ . Therefore, $\mathcal{X}_{u}\cap Q\subseteq\mathcal{X}^{\diamond}$ . If there exists a $\boldsymbol{z}\in\mathcal{X}^{\diamond}\subseteq Q$ such that $\boldsymbol{z}\notin\mathcal{X}_{u}$ , then $f_{u}(\boldsymbol{z})>f_{u}({\boldsymbol{x}}_{u})$ which, since both $\boldsymbol{z}$ and ${\boldsymbol{x}}_{u}$ are in $Q$ , implies that $\mathcal{L}(\boldsymbol{z})>\mathcal{L}({\boldsymbol{x}}_{u})$ and contradicts the definition of $\mathcal{X}^{\diamond}$ . Therefore, $\mathcal{X}^{\diamond}\subseteq\mathcal{X}_{u}$ .

We now prove (b). By (a), $\mathcal{X}_{U}\cap Q=\mathcal{X}^{\diamond}$ , which confirms (b) for $u=U$ . Consider now $u>U$ , a $\boldsymbol{y}\in\mathcal{X}_{U}\cap Q=\mathcal{X}^{\diamond}$ , and any ${\boldsymbol{x}}\in\mathcal{X}_{u}$ . Then,

[TABLE]

By summing the two inequalities in (22) and rearranging, we obtain $r(\boldsymbol{y})\geq r({\boldsymbol{x}})$ . Since $\boldsymbol{y}\in Q$ , ${\boldsymbol{x}}$ is also in $Q$ ; i.e., $\mathcal{X}_{u}\subseteq Q$ , which implies $\mathcal{X}_{u}=\mathcal{X}^{\diamond}$ by (a). ∎

As $u$ increases, $\mathcal{X}_{u}$ moves gradually towards $Q$ and, according to the definition (21), $\mathcal{X}_{u}$ and $Q$ do not intersect when $u<U$ . Once $u=U$ , the intersection of the two sets is $\mathcal{X}^{\diamond}$ , and, by Remark 1 (b), $\mathcal{X}_{u}=\mathcal{X}^{\diamond}$ for all $u>U$ .

II-A Irrelevant Signal Sparsity Regularization

Remark 2

The following claims are equivalent:

(a)

$\mathcal{X}^{\diamond}\cap\mathcal{X}_{0}\neq\emptyset$ ; i.e., there exists an ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ such that

[TABLE] 2. (b)

$\mathcal{X}^{\diamond}\subseteq\mathcal{X}_{0}$ ; and 3. (c)

$U=0$ ; i.e., $\mathcal{X}_{0}\cap Q\neq\emptyset$ .

Proof:

(c) follows from (a) because $\mathcal{X}^{\diamond}\subseteq Q$ . (b) follows from (c) by applying Remark 1 (a) to obtain $\mathcal{X}_{0}\cap Q=\mathcal{X}^{\diamond}$ , which implies (b). Finally, (b) implies (a). ∎

Having $\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})=\boldsymbol{0}$ for at least one ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ implies (23) and is therefore a stronger condition than (23).

Example 1

Consider $\mathcal{L}({\boldsymbol{x}})=\lVert{\boldsymbol{x}}\rVert_{2}^{2}$ and $C=\bigl{\{}{\boldsymbol{x}}\in\mathamsbb{R}^{2}\mid\lVert{\boldsymbol{x}}-\boldsymbol{1}_{2\times 1}\rVert_{2}\leq 1\bigr{\}}$ . (Here, $\mathcal{L}({\boldsymbol{x}})$ could correspond to the Gaussian measurement model with measurements equal to zero.) Since $C$ is a circle within $\mathamsbb{R}_{+}^{2}$ , the objective functions for the identity ( $\Psi=I_{2}$ ) and 1D TV sparsifying transforms are

[TABLE]

*respectively, where $\mathcal{X}_{u}=\mathcal{X}^{\diamond}=Q=\{{\boldsymbol{x}}^{\diamond}\}$ and ${\boldsymbol{x}}^{\diamond}=\bigl{(}1-{\sqrt{2}}/{2}\bigr{)}\boldsymbol{1}$ . Here, $\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})=(2-\sqrt{2})\boldsymbol{1}_{2\times 1}$ and $N_{C}({\boldsymbol{x}}^{\diamond})=\{a\boldsymbol{1}\mid a\leq 0\}$ , which confirms that (23) holds. *

II-B Condition for Infinite $U$ and Guarantees for Finite $U$

Remark 3

If there exists ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ such that

[TABLE]

then $U=+\infty$ . When (19) holds, the reverse is also true with a stronger claim: $U=+\infty$ implies (25) for all ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ .

Proof:

First, we prove sufficiency by contradiction. If a finite $U$ exists, then $\mathcal{X}^{\diamond}\subseteq\mathcal{X}_{u}$ for all $u\geq U$ . Therefore, (15a) holds with ${\boldsymbol{x}}$ being any ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ , which contradicts (25).

In the case where (19) holds, we prove the necessity by contradiction. If (25) does not hold for all ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ , there exist $\boldsymbol{t}\in N_{C}({\boldsymbol{x}}^{\diamond})$ and $\boldsymbol{w}\in\mathamsbb{C}^{p^{\prime}}$ such that

[TABLE]

Since (19) holds, $\Psi^{H}{\boldsymbol{x}}^{\diamond}=\boldsymbol{0}$ and $G(\Psi^{H}{\boldsymbol{x}}^{\diamond})=H$ ; see (20). When $u\geq\lVert\boldsymbol{w}\rVert_{\infty}$ , $\boldsymbol{w}\in uH$ and $\operatorname{Re}(\Psi\boldsymbol{w})\in u\operatorname{Re}\bigl{(}\Psi G(\Psi^{H}{\boldsymbol{x}}^{\diamond})\bigr{)}$ . Therefore, (15a) holds at ${\boldsymbol{x}}={\boldsymbol{x}}^{\diamond}$ for all $u\geq\lVert\boldsymbol{w}\rVert_{\infty}$ , which contradicts $U=+\infty$ . ∎

Example 2

Consider $\mathcal{L}({\boldsymbol{x}})=x_{1}+\mathbb{I}_{\mathamsbb{R}_{+}}(x_{1})$ , $\Psi=I_{2}$ , and $C=\bigl{\{}{\boldsymbol{x}}\in\mathamsbb{R}^{2}\mid\lVert{\boldsymbol{x}}-\boldsymbol{1}_{2\times 1}\rVert_{2}\leq 1\bigr{\}}$ . (Here, $\mathcal{L}({\boldsymbol{x}})$ could correspond to the $\operatorname{Poisson}(x_{1})$ measurement model with measurement equal to zero.) Since $C$ is a circle within $\mathamsbb{R}_{+}^{2}$ , the objective function is

[TABLE]

with $\mathcal{X}_{u}=\{{\boldsymbol{x}}_{u}\}$ , $\mathcal{X}^{\diamond}=Q=\{{\boldsymbol{x}}^{\diamond}\}$ , and

[TABLE]

which implies $U=+\infty$ , consistent with the observation that $\mathcal{X}_{u}\cap Q=\emptyset$ . Here, (19) is not satisfied: (25) is only a sufficient condition for $U=+\infty$ and does not hold in this example.

Example 3

Consider $\mathcal{L}({\boldsymbol{x}})=\lVert{\boldsymbol{x}}\rVert^{2}_{2}$ , 1D TV sparsifying transform with $\Psi=D^{T}(2)$ , and $C=\bigl{\{}{\boldsymbol{x}}\in\mathamsbb{R}^{2}\mid\bigl{\|}{\boldsymbol{x}}-\bigl{[}2,\;0\bigr{]}^{T}\bigr{\|}_{2}^{2}\leq 2\bigr{\}}$ . Since $C$ is a circle with $x_{1}-x_{2}\geq 0$ , the objective function is

[TABLE]

*with $\mathcal{X}_{u}=\bigl{\{}\bigl{[}2-(1+{4}/{u})/q(u),\;1/q(u)\bigr{]}^{T}\bigr{\}}$ , $q(u)\triangleq\sqrt{1+{4}/{u}+{8}/{u^{2}}}$ , and $\mathcal{X}^{\diamond}=Q=\{\boldsymbol{1}_{2\times 1}\}$ , which implies $U=+\infty$ . Since (19) holds in this example, (25) is necessary and sufficient for $U=+\infty$ . Since $-\boldsymbol{1}^{T}\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})=-4$ and $N_{C}({\boldsymbol{x}}^{\diamond})=\{(-a,a)^{T}\mid a\geq 0\}$ , (25) holds. *

II-B1 Two cases of finite $U$

If $d=p$ and (19) holds, then $U$ must be finite: in this case, condition (25) in Remark 3 cannot hold, which is easy to confirm by substituting $\operatorname{Re}(\mathcal{R}(\Psi))=\mathamsbb{R}^{p}$ into (25).

$U$ must also be finite if

[TABLE]

Indeed, (30) implies (19) and that for ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}\cap\operatorname{int}C$ ,

[TABLE]

and hence (25) cannot hold upon substituting (31a) and (31b). Here, (31b) follows from $\boldsymbol{0}\in\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})+N_{Q}({\boldsymbol{x}}^{\diamond})$ , the condition for optimality of the optimization problem $\min_{{\boldsymbol{x}}\in Q}\mathcal{L}({\boldsymbol{x}})$ that defines $\mathcal{X}^{\diamond}$ , by using the fact that $N_{Q}({\boldsymbol{x}}^{\diamond})=\operatorname{Re}(\mathcal{R}(\Psi))$ when ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}\cap\operatorname{int}C$ .

If (30) holds then, by Remark 2, $U=0$ if and only if $\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})=\boldsymbol{0}$ .

III Bounds When (19) Holds

We now present an optimization problem for finding $U$ when (19) holds.

Theorem 1

Assume that (19) holds and that the convex NLL $\mathcal{L}({\boldsymbol{x}})$ is differentiable within $\mathcal{X}^{\diamond}$ . Consider the following optimization problem:

[TABLE]

with

[TABLE]

Then, $U_{0}({\boldsymbol{x}}^{\diamond})=U$ for all ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ and $U$ in (21).

Here, $U=+\infty$ if and only if the constraints in (32c) and (32c) cannot be satisfied for any ${\boldsymbol{a}}$ , which is equivalent to ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ satisfying (25) in Remark 3.

Proof:

Observe that $G(\Psi^{H}{\boldsymbol{x}}^{\diamond})=H$ for all ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ and

[TABLE]

due to (19) and (10a), respectively.

We first prove that $\mathcal{X}^{\diamond}\subseteq\mathcal{X}_{u}$ if $u\geq U_{0}({\boldsymbol{x}}^{\diamond})$ . Consider any ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ and denote by $(\tilde{{\boldsymbol{a}}},\tilde{\boldsymbol{t}})$ a pair $({\boldsymbol{a}},\boldsymbol{t})$ that solves the minimization problem (P0). Since $u\geq U_{0}({\boldsymbol{x}}^{\diamond})$ , there exists an $\tilde{\boldsymbol{h}}\in H$ such that $\boldsymbol{p}({\boldsymbol{x}}^{\diamond},\tilde{{\boldsymbol{a}}},\tilde{\boldsymbol{t}})+u\tilde{\boldsymbol{h}}=\boldsymbol{0}$ . Using (34), we obtain

[TABLE]

which implies ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}_{u}$ according to (15a).

Second, we prove that if $u<U_{0}({\boldsymbol{x}}^{\diamond})$ for any ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ , then $\mathcal{X}^{\diamond}\cap\mathcal{X}_{u}=\emptyset$ . We employ proof by contradiction. Suppose $\mathcal{X}^{\diamond}\cap\mathcal{X}_{u}\neq\emptyset$ ; then, there exists an ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}\cap\mathcal{X}_{u}$ . According to (15a), there exist an $\check{\boldsymbol{h}}\in H$ and an $\check{{\boldsymbol{a}}}\in N_{C}({\boldsymbol{x}}^{\diamond})$ such that $\boldsymbol{0}=u\operatorname{Re}(\Psi\check{\boldsymbol{h}})+\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})+\check{{\boldsymbol{a}}}$ . Using (34), we have

[TABLE]

Note that

[TABLE]

Inserting (LABEL:eq:id) into (36) and using (10a) and the fact that $F$ has full column rank leads to $\boldsymbol{0}={F^{+}[\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})+\check{{\boldsymbol{a}}}]+u\operatorname{Re}(Z\check{\boldsymbol{h}})}$ ; thus

[TABLE]

Now, rearrange and use the fact that $\lVert\check{\boldsymbol{h}}\rVert_{\infty}\leq 1$ (see (20)) to obtain

[TABLE]

which contradicts (32), where $U_{0}({\boldsymbol{x}}^{\diamond})$ is the minimum.

Finally, we prove by contradiction that $U_{0}({\boldsymbol{x}}^{\diamond})$ is invariant within $\mathcal{X}^{\diamond}$ if $\mathcal{X}^{\diamond}$ has more than one element. Assume that there exist ${\boldsymbol{x}}^{\diamond}_{1},{\boldsymbol{x}}^{\diamond}_{2}\in\mathcal{X}^{\diamond}$ and $u$ such that $U_{0}({\boldsymbol{x}}^{\diamond}_{1})\leq u<U_{0}({\boldsymbol{x}}^{\diamond}_{2})$ . We obtain contradictory results: ${\boldsymbol{x}}^{\diamond}_{1}\in\mathcal{X}_{u}$ and $\mathcal{X}^{\diamond}\cap\mathcal{X}_{u}\neq\emptyset$ because $u\geq U_{0}({\boldsymbol{x}}^{\diamond}_{1})$ and $u<U_{0}({\boldsymbol{x}}^{\diamond}_{2})$ , respectively. Therefore, $U=U_{0}({\boldsymbol{x}}^{\diamond})$ is invarant to ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ .

The constraints on ${\boldsymbol{a}}$ in (32c) and (32c) are equivalent to stating that (25) does not hold for any ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ ; see also (10b). If an ${\boldsymbol{a}}$ does not exist that satisfies these constraints, (25) holds and $U=+\infty$ according to Remark 3. ∎

We make a few observations: (P0) is a linear programming problem with linear constraints and can be solved using CVX [13] and Matlab’s optimization toolbox upon identifying $N_{C}({\boldsymbol{x}}^{\diamond})$ and $\mathcal{R}(F)$ in (32c) and (32c), respectively. Theorem 1 requires differentiability of the NLL only at ${\boldsymbol{x}}={\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ . If $\Psi$ is real, then $Z$ is real as well, the optimal $\boldsymbol{t}$ in (P0) has zero imaginary component and the corresponding simplified version of Theorem 1 follows and requires optimization in (P0) with respect to real-valued $\boldsymbol{t}\in\mathamsbb{R}^{p^{\prime}}$ .

If $\Psi$ is real and $d=p^{\prime}$ , then we can select $Z=I$ , which leads to $Z^{\ddagger}=I$ and cancellation of the variable $\boldsymbol{t}$ in (32c) and simplification of (P0).

We now specialize Theorem 1 to two cases with finite $U$ .

Corollary 1 ( $d=p$ )

If $d=p$ and if (19) holds, then $U$ in (21) can be computed as

[TABLE]

Proof:

Theorem 1 applies, $\mathcal{X}^{\diamond}=\{\boldsymbol{0}\}$ , and $U$ must be finite. Setting $F=I$ in (32) leads to (40). ∎

If $C=\mathamsbb{R}_{+}^{p}$ , then $N_{C}(\boldsymbol{0})=\mathamsbb{R}_{-}^{p}$ and the condition ${\boldsymbol{a}}\in N_{C}(\boldsymbol{0})$ reduces to ${\boldsymbol{a}}\preceq\boldsymbol{0}$ .

Corollary 2 ( $\mathcal{X}^{\diamond}\cap\operatorname{int}C\neq\emptyset$ )

If (30) holds, then $U$ in (21) can be computed as

[TABLE]

with any ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}\cap\operatorname{int}C$ .

Proof:

Thanks to (30), (19) and (31a)–(31b) are satisfied, Theorem 1 applies, $U$ must be finite, and ${\boldsymbol{a}}=\boldsymbol{0}$ (by (31a)). By using these facts, we simplify (32) to obtain (41). ∎

If $d=p$ and $\boldsymbol{0}\in\operatorname{int}C$ , then both Corollaries 1 and 2 apply and the upper bound $U$ can be obtained by setting ${\boldsymbol{a}}=\boldsymbol{0}$ and $N_{C}(\boldsymbol{0})=\{\boldsymbol{0}\}$ in (40) or by setting ${\boldsymbol{x}}^{\diamond}=\boldsymbol{0}$ and $F=I$ in (41).

Example 4

Consider a real invertible $\Psi\in\mathamsbb{R}^{p\times p}$ .

(a)

If $C=\mathamsbb{R}_{+}^{p}$ , Corollary 1 applies and (40) becomes

[TABLE] 2. (b)

If $\boldsymbol{0}\in\operatorname{int}C$ , Corollaries 1 and 2 apply and the bound $U$ simplifies to

[TABLE]

For $\Psi=I$ and a linear measurement model with white Gaussian noise, (42b) reduces to the expressions in **[3, eq. (4)]** and **[5, Sec. III]**, used in **[5]** to design its continuation scheme; **[3]** and **[5]** also assume $C=\mathamsbb{R}^{p}$ .

Example 5 (One-dimensional TV regularization)

Consider 1D TV regularization with $\Psi=D^{T}(p)\in\mathamsbb{R}^{p\times p}$ obtained by setting $K=1,J=p$ in (16a); note that $d=p-1$ . Consider a constant signal ${\boldsymbol{x}}^{\diamond}=\boldsymbol{1}x_{0}^{\diamond}\in\mathcal{X}^{\diamond}$ . Then Theorem 1 applies and yields

[TABLE]

The bounds obtained by solving (P0) are often simple but restricted to the scenario where (19) holds. In the following section, we remove assumption (19) and develop a general numerical method for finding $U$ in (21).

IV ADMM Algorithm for Computing $U$

We focus on the nontrivial scenario where (23) does not hold and assume $u>0$ . We also assume that an ${\boldsymbol{x}}^{\diamond}\in\mathcal{X}^{\diamond}$ is available, which will be sufficient to obtain the $U$ in (21). We use the duality of norms [14, App. A.1.6]:

[TABLE]

to rewrite the minimization of (1) as the following min-max problem (see also (20)):

[TABLE]

Since the objective function in (45) is convex with respect to ${\boldsymbol{x}}$ and concave with respect to $\boldsymbol{w}$ , the optimal $({\boldsymbol{x}},\boldsymbol{w})=({\boldsymbol{x}}_{u},\boldsymbol{w}_{u})$ is at the saddle point of (45) and satisfies

[TABLE]

Now, select $U$ as the smallest $u$ for which (46a)–(46b) hold with ${\boldsymbol{x}}_{u}={\boldsymbol{x}}^{\diamond}$ :

[TABLE]

where $(v^{\diamond},\boldsymbol{w}^{\diamond},\boldsymbol{t}^{\diamond})$ is the solution to the following constrained linear programming problem:

[TABLE]

obtained from (46a)–(46b) with ${\boldsymbol{x}}_{u}$ and $\boldsymbol{w}_{u}$ replaced by ${\boldsymbol{x}}^{\diamond}$ and $\boldsymbol{w}$ . Here,

[TABLE]

is the normalized gradient (for numerical stability) of the NLL at ${\boldsymbol{x}}^{\diamond}$ ; $\nabla\mathcal{L}({\boldsymbol{x}}^{\diamond})\neq\boldsymbol{0}$ because (23) does not hold. Due to (15b), $v=0$ is a feasible point that satisfies the constraints (49a), which implies that $v^{\diamond}\geq 0$ . When (25) holds, $v$ has to be zero, implying $U=+\infty$ .

To solve (P1) and find ${v^{\diamond}}$ , we apply an iterative algorithm based on alternating direction method of multipliers (ADMM) [15, 16]

[TABLE]

where $\rho>0$ is a tuning parameter for the ADMM iteration and we solve (51a) using the Broyden-Fletcher-Goldfarb-Shanno optimization algorithm with box constraints [17] and projected Nesterov’s proximal-gradient (PNPG) algorithm [18] for real and complex $\Psi$ , respectively. We initialize the iteration (51) with $v^{(0)}=1$ , $\boldsymbol{t}^{(0)}=\boldsymbol{0}$ , $\boldsymbol{z}^{(0)}=\boldsymbol{0}$ , and $\rho=1$ , where $\rho$ is adaptively adjusted thereafter using the scheme in [15, Sec. 3.4.1].

In special cases, (51) simplifies. If (19) holds, then $\Psi^{H}{\boldsymbol{x}}^{\diamond}=\boldsymbol{0}$ and the constraint in (51a) simplifies to $\lVert\boldsymbol{w}\rVert_{\infty}\leq 1$ ; see (20). If $\operatorname{Re}(\Psi\Psi^{H})=cI,\,c>0$ , and $\Psi\in\mathamsbb{R}^{p\times p}$ or $\Psi\in\mathamsbb{C}^{p\times p/2}$ , (51a) has the following analytical solution:

[TABLE]

When (30) holds, (51c) reduces to $\boldsymbol{t}^{(i)}=\boldsymbol{0}$ for all $i$ , thanks to (31a).

When $\Psi$ is real, the constraints imposed by $\mathbb{I}_{G(\Psi^{H}{\boldsymbol{x}}^{\diamond})}(\boldsymbol{w})$ become linear and (P1) becomes a linear programming problem with linear constraints.

V Numerical Examples

Matlab implementations of the presented examples are available at https://github.com/isucsp/imgRecSrc/uBoundEx. In all numerical examples, the empirical upper bounds $U$ were obtained by a grid search over $u$ with $\mathcal{X}_{u}=\{{\boldsymbol{x}}_{u}\}$ obtained using the PNPG method [18].

V-A Signal reconstruction for Gaussian linear model

We adopt the linear measurement model with white Gaussian noise and scaled NLL $\mathcal{L}({\boldsymbol{x}})=0.5\|\boldsymbol{y}-\Phi{\boldsymbol{x}}\|_{2}^{2}$ , where the elements of the sensing matrix $\Phi\in\mathamsbb{R}^{N\times p}$ are independent, identically distributed (i.i.d.) and drawn from the uniform distribution on a unit sphere. We reconstruct the nonnegative “skyline” signal ${\boldsymbol{x}}_{\text{true}}\in\mathamsbb{R}^{1024\times 1}$ in [18, Sec. LABEL:report-sec:linear1dex] from noisy linear measurements $\boldsymbol{y}$ using the discrete wavelet transform (DWT) and 1D TV regularizations, where the DWT matrix $\Psi$ is orthogonal ( $\Psi\Psi^{T}=\Psi^{T}\Psi=I$ ), constructed using the Daubechies-4 wavelet with three decomposition levels. Define the signal-to-noise ratio (SNR) as

[TABLE]

where $\sigma^{2}$ is the variance of the Gaussian noise added to $\Phi{\boldsymbol{x}}_{\text{true}}$ to create the noisy measurement vector $\boldsymbol{y}$ .

For $C=\mathamsbb{R}_{+}^{p}$ and $C=\mathamsbb{R}^{p}$ with DWT regularization, $\mathcal{X}^{\diamond}=\{\boldsymbol{0}\}$ and Example 4 applies and yields the upper bounds (42a) and (42b), respectively.

For TV regularization, we apply the result in Example 5. For $C=\mathamsbb{R}^{p}$ and $C=\mathamsbb{R}_{+}^{p}$ , we have $\mathcal{X}^{\diamond}=\{\boldsymbol{1}x_{0}\}$ and $\mathcal{X}^{\diamond}=\{\boldsymbol{1}\max(x_{0},0)\}$ , respectively, where

[TABLE]

If $\boldsymbol{1}x_{0}\in\operatorname{int}C$ , which holds when $C=\mathamsbb{R}^{p}$ or when $C=\mathamsbb{R}_{+}^{p}$ and $x_{0}>0$ , then the bound $U$ is given by (43b). For $C=\mathamsbb{R}_{+}^{p}$ and if $x_{0}\leq 0$ , then $\mathcal{X}^{\diamond}=\{\boldsymbol{0}\}$ and (43a) applies. In this case, $U=0$ if $[\nabla\mathcal{L}(\boldsymbol{0})]_{i}\geq 0$ for $i=1,\dotsc,p-1$ , which occurs only when $[\nabla\mathcal{L}(\boldsymbol{0})]_{i}=0$ for all $i$ .

Table I shows the theoretical and empirical bounds for DWT and TV regularizations and $C=\mathamsbb{R}_{+}^{p}$ and $C=\mathamsbb{R}^{p}$ ; we decrease the SNR from $30\text{\,}\mathrm{dB}-30\text{\,}\mathrm{dB}$ with independent noise realizations for different SNRs. The theoretical bounds in Sections III and IV coincide. For DWT regularization, $\mathcal{X}^{\diamond}$ is the same for both convex sets $C$ and thus the upper bound $U$ for $C=\mathamsbb{R}_{+}^{p}$ is always smaller than its counterpart for $C=\mathamsbb{R}^{p}$ , thanks to being optimized over variable ${\boldsymbol{a}}$ in (42a). For TV regularization, when $x_{0}>0$ , the upper bounds $U$ coincide for both $C$ because, in this case, $\mathcal{X}^{\diamond}$ is the same for both $C$ and $\mathcal{X}^{\diamond}\in\operatorname{int}C$ . In the last row of Table I we show the case where $x_{0}\leq 0$ ; then, $\mathcal{X}^{\diamond}$ differs for the two convex sets $C$ , and the upper bound $U$ for $C=\mathamsbb{R}_{+}^{p}$ is smaller than its counterpart for $C=\mathamsbb{R}^{p}$ , thanks to being optimized over variable ${\boldsymbol{a}}$ in (43a): compare (43a) with (43b).

V-B PET image reconstruction from Poisson measurements

Consider positron emission tomography (PET) reconstruction of the $128\times 128$ concentration map ${\boldsymbol{x}}_{\text{true}}$ in [18, Fig. LABEL:report-fig:pet], which represents simulated radiotracer activity in a human chest, from independent noisy Poisson-distributed measurements $\boldsymbol{y}=(y_{n})$ with means $[\Phi{\boldsymbol{x}}_{\text{true}}+{\boldsymbol{b}}]_{n}$ . The choices of parameters in the PET system setup and concentration map ${\boldsymbol{x}}_{\text{true}}$ have been taken from the Image Reconstruction Toolbox (IRT) [19, emission/em_test_setup.m]. Here,

[TABLE]

is the known sensing matrix; $\boldsymbol{\kappa}$ is the density map needed to model the attenuation of the gamma rays [20]; ${\boldsymbol{b}}=(b_{i})$ is the known intercept term accounting for background radiation, scattering effect, and accidental coincidence;222The elements of the intercept term have been set to a constant equal to $10\text{\,}\mathrm{\char 37\relax}$ of the sample mean of $\Phi{\boldsymbol{x}}_{\text{true}}$ : ${\boldsymbol{b}}=[{\boldsymbol{1}^{T}\Phi{\boldsymbol{x}}_{\text{true}}}/(10N)]\boldsymbol{1}$ . ${\boldsymbol{c}}$ is a known vector that models the detector efficiency variation; and $w>0$ is a known scaling constant, which we use to control the expected total number of detected photons due to electron-positron annihilation, $\boldsymbol{1}^{T}\operatorname{E}(\boldsymbol{y}-{\boldsymbol{b}})=\boldsymbol{1}^{T}\Phi{\boldsymbol{x}}_{\text{true}}$ , an SNR measure. We collect the photons from $90$ equally spaced directions over $180\text{\,}\mathrm{\SIUnitSymbolDegree}$ , with $128$ radial samples at each direction. Here, we adopt the parallel strip-integral matrix $S$ [21, Ch. 25.2] and use its implementation in the IRT [19].

We now consider the nonnegative convex set $C=\mathamsbb{R}_{+}^{p}$ , which ensures that (3) holds, and 2D isotropic and anisotropic TV and DWT regularizations, where the 2D DWT matrix $\Psi$ is constructed using the Daubechies-6 wavelet with six decomposition levels.

For TV regularizations, $\mathcal{X}^{\diamond}=\{\boldsymbol{1}\max(0,x_{0})\}$ , where $x_{0}=\arg\min_{x\in\mathamsbb{R}}\mathcal{L}(\boldsymbol{1}x)$ , computed using the bisection method that finds the zero of ${\partial\mathcal{L}(\boldsymbol{1}x)}/{\partial x}$ , which is an increasing function of $x\in\mathamsbb{R}_{+}$ . Here, no search for $x_{0}$ is needed when $\mathinner{\partial\mathcal{L}(\boldsymbol{1}x)/\partial x\rvert}_{x=0}>0$ , because in this case $x_{0}<0$ .

We computed the theoretical bounds using the ADMM-type algorithm in Section IV.

Table II shows the theoretical and empirical bounds for DWT and TV regularizations and the SNR $\boldsymbol{1}^{T}\Phi{\boldsymbol{x}}_{\text{true}}$ varying from ${10}^{1}$ to ${10}^{9}$ , with independent measurement realizations for different SNRs.

Denote the isotropic and anisotropic 2D TV bounds by $U_{\text{iso}}$ and $U_{\text{ani}}$ , respectively. Then, it is easy to show that when (19) holds, $U_{\text{ani}}\leq U_{\text{iso}}\leq\sqrt{2}U_{\text{ani}}$ , which follows by using the inequalities $\sqrt{2}\sqrt{a^{2}+b^{2}}\geq\lvert a\rvert+\lvert b\rvert\geq\sqrt{a^{2}+b^{2}}$ and is confirmed in Table II.

VI Concluding Remarks

Future work will include obtaining simple expressions for upper bounds $U$ for isotropic 2D TV regularization, based on Theorem 1.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Andrey N. Tikhonov and Vasiliy Y. Arsenin “Solutions of Ill-Posed Problems” Washington, DC: Winston, 1977
2[2] Curtis R. Vogel “Computational Methods for Inverse Problems” Philadelphia, PA: SIAM, 2002
3[3] S. J. Kim, K. Koh, M. Lustig, S. Boyd and D. Gorinevsky “An Interior-Point Method for Large-Scale ℓ 1 subscript ℓ 1 \ell_{1} -Regularized Least Squares” In IEEE J. Sel. Topics Signal Process. 1.4 , 2007, pp. 606–617
4[4] E. Hale, W. Yin and Y. Zhang “Fixed-Point Continuation for ℓ 1 subscript ℓ 1 \ell_{1} -Minimization: Methodology and Convergence” In SIAM J. Optim. 19.3 , 2008, pp. 1107–1130
5[5] Stephen J Wright, Robert D Nowak and Mário A T Figueiredo “Sparse Reconstruction by Separable approximation” In IEEE Trans. Signal Process. 57.7 IEEE, 2009, pp. 2479–2493
6[6] E. Allgower and K. Georg “Introduction to Numerical Continuation Methods” Philadelphia, PA: SIAM, 2003
7[7] Dimitri P. Bertsekas “Convex Optimization Theory” Belmont, MA: Athena Scientific, 2009
8[8] P. Bouboulis, K. Slavakis and S. Theodoridis “Adaptive Learning in Complex Reproducing Kernel Hilbert Spaces Employing Wirtinger’s Subgradients” In IEEE Trans. Neural Netw. Learn. Syst. 23.3 , 2012, pp. 425–438

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Upper-Bounding the Regularization Constant for Convex Sparse Signal

Abstract

I Introduction

Lemma 1

Proof:

II Upper Bound Definition and Properties

Remark 1

Proof:

II-A Irrelevant Signal Sparsity Regularization

Remark 2

Proof:

Example 1

II-B Condition for Infinite UUU and Guarantees for Finite UUU

Remark 3

Proof:

Example 2

Example 3

II-B1 Two cases of finite UUU

III Bounds When (19) Holds

Theorem 1

Proof:

Corollary 1** (d=pd=pd=p)**

Proof:

Corollary 2** (X⋄∩int⁡C≠∅\mathcal{X}^{\diamond}\cap\operatorname{int}C\neq\emptysetX⋄∩intC=∅)**

Proof:

Example 4

Example 5** (One-dimensional TV regularization)**

IV ADMM Algorithm for Computing UUU

V Numerical Examples

V-A Signal reconstruction for Gaussian linear model

V-B PET image reconstruction from Poisson measurements

VI Concluding Remarks

II-B Condition for Infinite $U$ and Guarantees for Finite $U$

II-B1 Two cases of finite $U$

Corollary 1 ( $d=p$ )

Corollary 2 ( $\mathcal{X}^{\diamond}\cap\operatorname{int}C\neq\emptyset$ )

Example 5 (One-dimensional TV regularization)

IV ADMM Algorithm for Computing $U$