Bilevel Optimization, Deep Learning and Fractional Laplacian   Regularization with Applications in Tomography

Harbir Antil; Zichao Di; Ratna Khatri

arXiv:1907.09605·eess.IV·June 24, 2020

Bilevel Optimization, Deep Learning and Fractional Laplacian Regularization with Applications in Tomography

Harbir Antil, Zichao Di, Ratna Khatri

PDF

TL;DR

This paper introduces a bilevel optimization neural network that learns optimal fractional Laplacian regularization parameters for inverse problems, demonstrating improved tomographic reconstruction especially with limited data.

Contribution

It proposes a novel bilevel neural network framework that learns regularization strength and fractional Laplacian exponent, outperforming total variation in inverse problems.

Findings

01

Fractional Laplacian regularization improves reconstruction quality.

02

The neural network effectively learns optimal regularization parameters.

03

Performance surpasses total variation regularization, especially with limited data.

Abstract

In this work we consider a generalized bilevel optimization framework for solving inverse problems. We introduce fractional Laplacian as a regularizer to improve the reconstruction quality, and compare it with the total variation regularization. We emphasize that the key advantage of using fractional Laplacian as a regularizer is that it leads to a linear operator, as opposed to the total variation regularization which results in a nonlinear degenerate operator. Inspired by residual neural networks, to learn the optimal strength of regularization and the exponent of fractional Laplacian, we develop a dedicated bilevel optimization neural network with a variable depth for a general regularized inverse problem. We also draw some parallels between an activation function in a neural network and regularization. We illustrate how to incorporate various regularizer choices into our proposed…

Tables1

Table 1. Table 1: Comparison of average MSE, SSIM and PSNR for tomographic reconstructions obtained via BONNet using the fractional Laplacian regularization for two projection angles. In Experiment I, we fix s = 0.4 𝑠 0.4 s=0.4 and learn λ ∗ superscript 𝜆 \lambda^{*} via BONNet, and in Experiment II we learn the ( λ ∗ , s ∗ ) superscript 𝜆 superscript 𝑠 (\lambda^{*},s^{*}) pair. The results shown are for the testing dataset. Notice that the search for μ ∗ = ( λ ∗ , s ∗ ) superscript 𝜇 superscript 𝜆 superscript 𝑠 \mu^{*}=(\lambda^{*},s^{*}) in Experiment II is now fully automated and the results are better or comparable to the previous case.

Data	Testing
$N_{θ}$	$10$		$20$
Type	Experiment I	Experiment II	Experiment I	Experiment II
$(λ, s)$	$(9.00678 e - 6, 0.4)$	$(5.04417 e - 6, 0.5413)$	$(1.65330 e - 5, 0.4)$	$(8.53717 e - 6, 0.3799)$
MSE	$9.8099$	$9.7743$	$8.9872$	$8.6961$
SSIM	$0.7675$	$0.7738$	$0.7888$	$0.7950$
PSNR	$34.3513$	$34.3831$	$35.1123$	$35.3973$

Equations82

u min J (u) \mathchar 58 = \frac{1}{2} ∥ K u - f ∥_{L^{2} (Ω)}^{2},

u min J (u) \mathchar 58 = \frac{1}{2} ∥ K u - f ∥_{L^{2} (Ω)}^{2},

u \in X_{a d} \subseteq X min J (u) \mathchar 58 = \frac{1}{2} ∥ K u - f ∥_{L^{2} (Ω)}^{2} + R (u, μ),

u \in X_{a d} \subseteq X min J (u) \mathchar 58 = \frac{1}{2} ∥ K u - f ∥_{L^{2} (Ω)}^{2} + R (u, μ),

R (u, μ) = λ \mbox T V (u),

R (u, μ) = λ \mbox T V (u),

⟨ - λ \mbox d i v (\frac{\nabla u}{∣\nabla u ∣}) + K^{*} (K u - f), \overset{u}{^} - u ⟩_{X^{'}, X} \geq 0, \forall \overset{u}{^} \in X_{a d}

⟨ - λ \mbox d i v (\frac{\nabla u}{∣\nabla u ∣}) + K^{*} (K u - f), \overset{u}{^} - u ⟩_{X^{'}, X} \geq 0, \forall \overset{u}{^} \in X_{a d}

R (u, μ) = \frac{1}{2} ∥ λ (- Δ)^{\frac{s}{2}} u ∥_{L^{2} (Ω)}^{2},

R (u, μ) = \frac{1}{2} ∥ λ (- Δ)^{\frac{s}{2}} u ∥_{L^{2} (Ω)}^{2},

⟨ λ (- Δ)^{s} u + K^{*} (K u - f), \overset{u}{^} - u ⟩ \geq 0, \forall \overset{u}{^} \in X_{a d}

⟨ λ (- Δ)^{s} u + K^{*} (K u - f), \overset{u}{^} - u ⟩ \geq 0, \forall \overset{u}{^} \in X_{a d}

K u (τ, θ) \mathchar 58 = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} u (x, y) δ (τ - x cos θ - y sin θ) d x d y,

K u (τ, θ) \mathchar 58 = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} u (x, y) δ (τ - x cos θ - y sin θ) d x d y,

μ \in M_{a d} min ϕ (μ)

μ \in M_{a d} min ϕ (μ)

\mbox s . t . u \in X_{a d} min J (u, μ)

ϕ (μ) \mathchar 58 = \frac{1}{2 m} i = 1 \sum m ∥ u^{(i)} (μ) - u_{t r u e}^{(i)} ∥_{L^{2} (Ω)}^{2},

ϕ (μ) \mathchar 58 = \frac{1}{2 m} i = 1 \sum m ∥ u^{(i)} (μ) - u_{t r u e}^{(i)} ∥_{L^{2} (Ω)}^{2},

R (σ (T)) = \frac{1}{2} ∥ σ (T (μ, u (μ))) ∥_{L^{2} (Ω)}^{2} .

R (σ (T)) = \frac{1}{2} ∥ σ (T (μ, u (μ))) ∥_{L^{2} (Ω)}^{2} .

u \in X_{a d} min J (u, μ) \mathchar 58 = \frac{1}{2 m} i = 1 \sum m [∥ K u^{(i)} - f^{(i)} ∥_{L^{2} (Ω)}^{2} + ∥ σ ∥_{L^{2} (Ω)}^{2}], μ \in M_{a d} .

u \in X_{a d} min J (u, μ) \mathchar 58 = \frac{1}{2 m} i = 1 \sum m [∥ K u^{(i)} - f^{(i)} ∥_{L^{2} (Ω)}^{2} + ∥ σ ∥_{L^{2} (Ω)}^{2}], μ \in M_{a d} .

D J (u^{(i)}, μ) [h] = \frac{1}{m} [(K^{*} (K u^{(i)} - f^{(i)}), h)_{L^{2} (Ω)} + ((\partial_{u^{(i)}} T)^{*} (\partial_{T} σ) σ, h)_{L^{2} (Ω)}] .

D J (u^{(i)}, μ) [h] = \frac{1}{m} [(K^{*} (K u^{(i)} - f^{(i)}), h)_{L^{2} (Ω)} + ((\partial_{u^{(i)}} T)^{*} (\partial_{T} σ) σ, h)_{L^{2} (Ω)}] .

u_{j}^{(i)} = P_{X_{a d}} (u_{j - 1}^{(i)} - α \nabla_{u_{j - 1}^{(i)}} J (u_{j - 1}^{(i)}, μ)) .

u_{j}^{(i)} = P_{X_{a d}} (u_{j - 1}^{(i)} - α \nabla_{u_{j - 1}^{(i)}} J (u_{j - 1}^{(i)}, μ)) .

u_{j}^{(i)}=P_{X_{ad}}\Big{(}u_{j-1}^{(i)}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}^{(i)}-f^{(i)})+(\partial_{u_{j-1}^{(i)}}T)^{*}(\partial_{T}\sigma)\sigma\big{]}\Big{)}.

u_{j}^{(i)}=P_{X_{ad}}\Big{(}u_{j-1}^{(i)}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}^{(i)}-f^{(i)})+(\partial_{u_{j-1}^{(i)}}T)^{*}(\partial_{T}\sigma)\sigma\big{]}\Big{)}.

μ \in M_{a d} min ϕ (μ) = \frac{1}{2 m} i = 1 \sum m ∥ u_{n}^{(i)} (μ) - u_{t r u e}^{(i)} ∥_{L^{2} (Ω)}^{2}

μ \in M_{a d} min ϕ (μ) = \frac{1}{2 m} i = 1 \sum m ∥ u_{n}^{(i)} (μ) - u_{t r u e}^{(i)} ∥_{L^{2} (Ω)}^{2}

s.t. u_{j}^{(i)} =

μ_{l + 1} = P_{M_{a d}} (μ_{l} - β \nabla_{μ_{l}} ϕ (μ_{l})), l = 0, ..., q - 1,

μ_{l + 1} = P_{M_{a d}} (μ_{l} - β \nabla_{μ_{l}} ϕ (μ_{l})), l = 0, ..., q - 1,

\nabla_{μ_{l}} ϕ (μ_{l}) = \frac{1}{m} i = 1 \sum m \int_{Ω} (u_{n}^{(i)} - u_{t r u e}^{(i)}) \frac{d u _{n}^{(i)}}{d μ}_{μ = μ_{l}} d Ω.

\nabla_{μ_{l}} ϕ (μ_{l}) = \frac{1}{m} i = 1 \sum m \int_{Ω} (u_{n}^{(i)} - u_{t r u e}^{(i)}) \frac{d u _{n}^{(i)}}{d μ}_{μ = μ_{l}} d Ω.

\frac{d u _{j}}{d μ}_{μ = μ_{l}} = \frac{\partial u _{j}}{\partial u _{j - 1}} \cdot \frac{d u _{j - 1}}{d μ}_{μ = μ_{l}} + \frac{\partial u _{j}}{\partial μ} \cdot \frac{d μ}{d μ}_{μ = μ_{l}},

\frac{d u _{j}}{d μ}_{μ = μ_{l}} = \frac{\partial u _{j}}{\partial u _{j - 1}} \cdot \frac{d u _{j - 1}}{d μ}_{μ = μ_{l}} + \frac{\partial u _{j}}{\partial μ} \cdot \frac{d μ}{d μ}_{μ = μ_{l}},

\frac{\partial u _{j}}{\partial u _{j - 1}} = I - \frac{α}{m}

\frac{\partial u _{j}}{\partial u _{j - 1}} = I - \frac{α}{m}

\displaystyle\Big{(}\left.\frac{\partial T}{\partial u_{j-1}}\Big{)}^{*}\Big{(}\frac{\partial\sigma}{\partial T}\Big{)}^{*}\Big{(}\frac{\partial\sigma}{\partial T}\cdot\frac{\partial T}{\partial u_{j-1}}\Big{)}\right],

\frac{\partial u _{j}}{\partial μ} = - \frac{α}{m}

\frac{\partial u _{j}}{\partial μ} = - \frac{α}{m}

\displaystyle\left.\Big{(}\frac{\partial T}{\partial u_{j-1}}\Big{)}^{*}\Big{(}\frac{\partial\sigma}{\partial T}\Big{)}^{*}\cdot\frac{\partial\sigma}{\partial T}\frac{\partial T}{\partial\mu}\right].

u_{j}^{(i)}=P_{X_{ad}}\Big{(}u_{j-1}^{(i)}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}^{(i)}-f^{(i)})+(\partial_{u_{j-1}^{(i)}}T)^{*}(\partial_{T}\sigma)\sigma\big{]}\Big{)}.

u_{j}^{(i)}=P_{X_{ad}}\Big{(}u_{j-1}^{(i)}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}^{(i)}-f^{(i)})+(\partial_{u_{j-1}^{(i)}}T)^{*}(\partial_{T}\sigma)\sigma\big{]}\Big{)}.

\frac{d u _{j}^{(i)}}{d μ}_{μ = μ_{l}} = \frac{\partial u _{j}^{(i)}}{\partial u _{j - 1}^{(i)}} \cdot \frac{d u _{j - 1}^{(i)}}{d μ}_{μ = μ_{l}} + \frac{\partial u _{j}^{(i)}}{\partial μ} \cdot \frac{d μ}{d μ}_{μ = μ_{l}}

\frac{d u _{j}^{(i)}}{d μ}_{μ = μ_{l}} = \frac{\partial u _{j}^{(i)}}{\partial u _{j - 1}^{(i)}} \cdot \frac{d u _{j - 1}^{(i)}}{d μ}_{μ = μ_{l}} + \frac{\partial u _{j}^{(i)}}{\partial μ} \cdot \frac{d μ}{d μ}_{μ = μ_{l}}

\nabla_{μ_{l}} ϕ (μ_{l}) = \frac{1}{m} i = 1 \sum m \int_{Ω} (u_{n}^{(i)} - u_{t r u e}^{(i)}) \frac{d u _{n}^{(i)}}{d μ}_{μ = μ_{l}} d Ω,

\nabla_{μ_{l}} ϕ (μ_{l}) = \frac{1}{m} i = 1 \sum m \int_{Ω} (u_{n}^{(i)} - u_{t r u e}^{(i)}) \frac{d u _{n}^{(i)}}{d μ}_{μ = μ_{l}} d Ω,

μ_{l + 1} = P_{M_{a d}} (μ_{l} - β \nabla_{μ_{l}} ϕ (μ_{l})) .

μ_{l + 1} = P_{M_{a d}} (μ_{l} - β \nabla_{μ_{l}} ϕ (μ_{l})) .

u_{j}^{(i)}=P_{X_{ad}}\Big{(}u_{j-1}^{(i)}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}^{(i)}-f^{(i)}_{test})+(\partial_{u_{j-1}^{(i)}}T)^{*}(\partial_{T}\sigma)\sigma\big{]}\Big{)}.

u_{j}^{(i)}=P_{X_{ad}}\Big{(}u_{j-1}^{(i)}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}^{(i)}-f^{(i)}_{test})+(\partial_{u_{j-1}^{(i)}}T)^{*}(\partial_{T}\sigma)\sigma\big{]}\Big{)}.

R (u, μ) = \frac{1}{2} ∥ λ (- Δ)^{\frac{s}{2}} u ∥_{L^{2} (Ω)}^{2},

R (u, μ) = \frac{1}{2} ∥ λ (- Δ)^{\frac{s}{2}} u ∥_{L^{2} (Ω)}^{2},

u_{j}=P_{X_{ad}}\Big{(}u_{j-1}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}-f)+\lambda(-\Delta)^{s}u_{j-1}\big{]}\Big{)},

u_{j}=P_{X_{ad}}\Big{(}u_{j-1}-\frac{\alpha}{m}\big{[}K^{*}(Ku_{j-1}-f)+\lambda(-\Delta)^{s}u_{j-1}\big{]}\Big{)},

\frac{\partial u _{j}}{\partial u _{j - 1}} = I - \frac{α}{m} K^{*} K - \frac{α λ}{m} (- Δ)^{s},

\frac{\partial u _{j}}{\partial u _{j - 1}} = I - \frac{α}{m} K^{*} K - \frac{α λ}{m} (- Δ)^{s},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamremark

hypothesisHypothesis

\newsiamthmclaimClaim \headersBilevel Optimization, Deep Learning and Fractional LaplacianHarbir Antil, Zichao (Wendy) Di, and Ratna Khatri

Bilevel Optimization, Deep

Learning and Fractional Laplacian Regularization with Applications in Tomography††thanks: Submitted to the editors DATE. \fundingThe first and third authors are partially supported by NSF grants DMS-1818772, DMS-1913004 and the Air Force Office of Scientific Research under Award NO: FA9550-19-1-0036. The third author is also partially supported by a Provost award at George Mason University under the Industrial Immersion Program. The second author is partially supported by DOE Office of Science under Contract No. DE-AC02-06CH11357.

Harbir Antil Department of Mathematical Sciences, George Mason University, Fairfax, VA 22030, USA. (, ). [email protected]

[email protected]

Zichao (Wendy) Di Mathematics and Computer Science Division, Argonne National Laboratory, IL, USA. (). [email protected]

Ratna Khatri22footnotemark: 2

Abstract

In this work we consider a generalized bilevel optimization framework for solving inverse problems. We introduce fractional Laplacian as a regularizer to improve the reconstruction quality, and compare it with the total variation regularization. We emphasize that the key advantage of using fractional Laplacian as a regularizer is that it leads to a linear operator, as opposed to the total variation regularization which results in a nonlinear degenerate operator. Inspired by residual neural networks, to learn the optimal strength of regularization and the exponent of fractional Laplacian, we develop a dedicated bilevel optimization neural network with a variable depth for a general regularized inverse problem. We also draw some parallels between an activation function in a neural network and regularization. We illustrate how to incorporate various regularizer choices into our proposed network. As an example, we consider tomographic reconstruction as a model problem and show an improvement in reconstruction quality, especially for limited data, via fractional Laplacian regularization. We successfully learn the regularization strength and the fractional exponent via our proposed bilevel optimization neural network. We observe that the fractional Laplacian regularization outperforms total variation regularization. This is specially encouraging, and important, in the case of limited and noisy data.

keywords:

bilevel optimization neural network, fractional Laplacian regularization, deep residual learning, imaging science, tomographic reconstruction, inverse problems.

{AMS}

65D18, 68U10, 62H35, 94A08, 35R11, 34K37, 65K10.

1 Introduction.

Inverse problems appear in numerous scientific domains, such as medicine, geophysics, astronomy, computer vision, and imaging etc.. However, they are typically ill-posed, due to the limited data and imperfection of experiments, and require some form of regularization [18, 17, 31, 38, 26]. Two key challenges are associated with solving a regularized inverse problem. The first is the choice of regularization. Among the most popular choices, the total variation regularization [40, 42] is of edge-preserving nature. However, its non-differentiability makes its usage numerically challenging. Another choice is the Tikhonov regularization [46], which has a smoothing property. Each choice, however, comes with its own challenges such as nonlinearity, non-smoothness, over-smoothing etc.. The second associated challenge is to choose the strength of the regularization, usually dictated by the parameter $\mu$ , for which there is no consensus.

Recently, deep learning approaches such as Convolution Neural Networks (CNN) and Residual Neural Networks (RNN) have shown remarkable potential in image classification and reconstruction where, often, the goal is to learn the whole regularizer [19, 50, 51]. These approaches, however, may not be robust in general [36, 34]. Firstly, learning problems are usually nonconvex, and the local minima may be sensitive to the initialization of parameters and the choice of optimization method. Secondly, these approaches often do not incorporate the domain-specific knowledge of the system (e.g., the known solution features) directly into the network, for instance. In addition, they often lack a mathematical justification. The main contributions of this paper are two-folds:

(a)

Extend the fractional Laplacian introduced in [2] as a regularizer to the general setting of a linear inverse problem.

(b)

Instead of learning the entire regularizer, we consider a bilevel optimization scheme to learn the strength of the regularization and the fractional exponent based on the prior knowledge of the system. More specifically, we set up a bilevel optimization neural network (BONNet). In this network, the upper level objective measures an expectation of the reconstruction error over the training data while the lower level problem measures the regularized data misfit.

There are several existing attempts to take advantage of machine learning to improve the solution quality. The most common way is to explore neural network as a post-processing step to refine the solution obtained by base-line methods (e.g., iterative method or filtered back projection [28]), see also [27, 41].

Our approach is closely related to the methodology introduced in [19], in fact ours can be thought as a special case in the case of total variation, where the authors consider a variational model for reconstruction of MRI data. The authors focus on a generalized total variation model (Fields of Experts model) and also learn the underlying parameters. We emphasize that the main novelty in our paper is the use of fractional Laplacian [11, 44, 2] as a regularizer and learning the fractional exponent with an application to tomographic reconstruction. The fractional Laplacian introduces nonlocality and tunable regularity. Another type of parameter search strategy has been proposed in [13] where the authors consider Tikhonov-based regularizations, and propose a machine learning based strategy to learn the strength of regularization. Their scheme is based on the generalized singular value decomposition (GSVD), or its approximation, of the forward operator and the regularization operator pair. However, computing GSVD can be computationally challenging [20]. Our approach differs from the existing works as we propose to use the fractional Laplacian as a regularizer, which is cheaper to evaluate, and allows us to enforce the prior knowledge of the sample features, including smoothness and sparsity. The fractional Laplacian has been successfully applied in image denoising [2, 5], geophysics [48], diffusion maps [3], biology [10], novel exterior optimal control [4] etc.. We also emphasize that our proposed framework is flexible for it can easily incorporate inequality constraints (on the optimization variables), which can be solved by a large number of existing solvers, and directly generalizes to other types of regularizations such as the $p$ -Laplacian [9, 33]. Therefore, our proposed framework brings machine learning closer to the traditional optimization. Notice that the machine learning algorithms are still in their infancy when it comes to handling constraints, see for instance [35] and the references therein.

The rest of the paper is organized as follows. In Section 2, we introduce the mathematical formulation of the standard linear inverse problem with regularizers. In particular, we consider the fractional Laplacian as a regularizer for inverse problems. We show a comparison of fractional Laplacian and total variation as regularizers for a tomographic reconstruction problem. Section 3 is devoted to our proposed framework, i.e., the Bilevel Optimization Neural Network to learn the optimal regularization strength, as well as the order of the fractional Laplacian. In Section 4, we provide further numerical experiments illustrating the application of BONNet to the tomographic reconstruction problem.

2 Regularization in Inverse Problems.

The regression model for data misfit in inverse problems is given by

[TABLE]

where $f\mathrel{\mathop{\mathchar 58\relax}}\Omega\mapsto{\mathbb{R}}$ is a given function and $\Omega\subset{\mathbb{R}}^{n}$ with $n\geq 1$ is a bounded domain. Here $K$ is the forward map, which we assume is a bounded linear operator on $L^{2}(\Omega)$ where the latter denotes the square integrable functions. Moreover, $u$ is the sample feature that we want to recover, or reconstruct. The ill-posed nature of Eq. 2.1 makes it almost necessary to consider regularization in the wake of often noise-filled data; owing to the imperfections in the data gathering process. Therefore, we consider a regularized regression model to improve the solution quality. In a more general sense, let $\Omega\subset{\mathbb{R}}^{n}$ with $n\geq 1$ be a bounded Lipschitz domain with boundary $\partial\Omega$ , $f\mathrel{\mathop{\mathchar 58\relax}}\Omega\rightarrow\mathbb{R}$ be an $L^{2}(\Omega)$ function (given datum), $K\mathrel{\mathop{\mathchar 58\relax}}L^{2}(\Omega)\rightarrow L^{2}(\Omega)$ be a bounded linear operator, and $X$ be a Banach space. Then a standard regularized variational model is given by

[TABLE]

where $X_{ad}$ is a closed, convex, nonempty admissible set which is contained in the solution space $X$ , and $u$ is the solution that we want to reconstruct or recover. Some examples of the operator $K$ for inverse problems in imaging science are the identity operator (image denoising problem) [40], convolution operator (image deblurring problem) [25, 24], and the Fourier or wavelet transforms [43]. Therefore, in Eq. 2.2, the first term prevents the forward simulation from departing “too far” away from $f$ , thus it helps maintain the fidelity to $f$ . In the absence of the second term ( $\mathcal{R}(u,\mu)$ ), Eq. 2.2 may be ill-posed [21]. The regularizer $\mathcal{R}(u,\mu)$ incorporates prior knowledge of the sample (like smoothness, sparsity, etc.), where $\mu$ balances the data misfit and the penalty enforced by the regularizer. Various choices of $\mathcal{R}(u,\mu)$ have been proposed in the literature. In this work, we focus on the tomographic reconstruction problem, regularized with the fractional Laplacian, and compare it against the total variation regularization.

2.1 Total Variation Regularization.

The penalty term for total variation (TV) regularization is given by

[TABLE]

where $\mu=\lambda$ is a scalar. $\mbox{TV}(u)$ denotes the total variation semi-norm on $\Omega$ and $X=BV(\Omega)\cap L^{2}(\Omega)$ where $BV(\Omega)$ denotes the set of functions of bounded variations [1]. Formally speaking, TV $(u)\mathrel{\mathop{\mathchar 58\relax}}=\int_{\Omega}|\nabla u|$ and as a result the corresponding Euler-Lagrange equations for Eq. 2.2 are: Find $u\in X_{ad}\subset X$ such that

[TABLE]

i.e., a nonlinear and possibly degenerate (due to $\displaystyle 1/|\nabla u|$ ) variational equation which is challenging to solve. We remark that $X^{\prime}$ is the dual of $X$ and $K^{*}$ is the adjoint of $K$ . Designing solvers for Eq. 2.4 is still an active area of research [7]. The success of TV( $u$ ) can be attributed to the fact that it prefers to fit shorter curves over the longer ones, thus avoids fitting noise and enforces sparsity. Additionally, it enforces much weaker regularity than the $H^{1}$ -regularization, i.e., when $\mathcal{R}(u,\mu)=\frac{\lambda}{2}\int_{\Omega}|\nabla u|^{2}$ , with $\mu=\lambda$ , and as a result it is possible to capture desirable sharp transitions in the reconstruction [40].

2.2 Fractional Laplacian Regularization.

The fractional Laplacian as a regularization for Eq. 2.2 is given by,

[TABLE]

where $\mu=(\lambda,s)$ is a vector. Moreover, with $0<s<1$ , and $(-\Delta)^{s}$ denoting the fractional power of the classical Laplacian defined for instance in a spectral sense [44, 2]. We remark that such a regularization enforces a reduced smoothness than $H^{1}$ -regularization. The extent of the smoothness is dictated by the fractional power $`s$ ’. The key advantage of using this regularization is that the resulting Euler-Lagrange equation for Eq. 2.5 are: Find $u\in X_{ad}$

[TABLE]

i.e., a variational equation with a linear operator. Such a problem has a unique solution in the fractional order Sobolev space $X=H^{s}(\Omega)$ [30]. This regularization has been applied successfully in image denoising [2] (with $K=I$ , but with $u\in X$ , instead of $X_{ad}$ , as a result Eq. 2.6 becomes an equality). The success of this regularizer can be attributed to the fact that when $s<\frac{1}{2}$ , the fractional Sobolev-space $H^{s}(\Omega)$ is larger than $BV(\Omega)\cap L^{\infty}(\Omega)$ , see [2]. As a result, we are solving a minimization problem over a larger space to achieve “better” results.

2.3 Tomographic Reconstruction.

Tomographic reconstruction is a noninvasive imaging technique with the goal of recovering the internal characteristic of a 3D object using a penetrating wave. It has shown revolutionary impact on various fields including physics, chemistry, biology, and astronomy. In a tomographic scan, a beam of light (e.g., X-ray) is projected onto the object to generate a 2D representation of the internal information along the beam path. By rotating the object, a series of such 2D projections are collected from different angles of view, collectively known as a sinogram (measurement data $f$ ), which can then be used to recover the internal characteristics (e.g., the attenuation coefficient) of the object [16] (see Fig. 1).

However, the limited data, due to the discrete nature of the physical experiment and dosage limits, makes the reconstruction problem ill-posed, i.e., many local minima exist for the objective function which is used to describe the discrepancy between the forward model and the measurement data. For illustration purpose, we confine ourselves to reconstruct 2D objects. The mathematical foundation of tomography is the Radon transform [39], for which $K$ is defined as,

[TABLE]

where $u\mathrel{\mathop{\mathchar 58\relax}}{\mathbb{R}}^{2}\mapsto{\mathbb{R}}$ is compactly supported on a bounded domain $\Omega\subset\mathbb{R}^{2}$ and $\delta$ is the Dirac mass, $\tau\in[0,\infty)$ and $\theta\in[0,2\pi)$ define the line of the beam path in a restricted domain. In practice, we can not recover the object at all points in space. Instead, we discretize $\Omega$ as $N\times N$ uniform pixels. Given $N_{\theta}$ number of angles and $N_{\tau}$ number of discrete beamlets, our goal is to recover the piecewise constant approximation (on each pixel) $u\in{\mathbb{R}}^{N^{2}}$ . Correspondingly, the discrete form of operator $K$ is the matrix ${\bf K}=(k_{i,j})_{i,j=1}^{N_{\theta}N_{\tau},N^{2}}$ where the entries $k_{i,j}$ denote the contribution of $j$ th pixel of $u$ to the $i$ th component of the generated data.

2.4 Comparison of Fractional Laplacian with TV for Tomographic Reconstruction.

To show the benefit of fractional Laplacian, we compare its performance against TV regularizer on a model problem. For now, we use common criterion to choose $\lambda$ and a fixed fractional exponent $s$ for this preliminary comparison. The rigorous computation of optimal $(\lambda,s)$ will be part of a forthcoming discussion.

We choose our test problem as the tomographic reconstruction. First we synthetically generate the tomographic measurements of the sample $u$ by taking its discrete Radon transform, which gives us the data $f$ . The sample $u$ and its corresponding sinogram $f$ are illustrated in Fig. 1. To get the noisy data, we add $0.1\%$ Gaussian noise to $f$ . More details on tomographic reconstruction is provided in Section 4. Next we show the reconstructions based on the two regularizers, namely the fractional Laplacian Eq. 2.5 and the total variation Eq. 2.3 in Fig. 2.

The left panel corresponds to reconstructions based on sinogram $f$ without noise, and the right panel corresponds to reconstructions based on noisy $f$ . Rows 1 and 2 pertain to total variation and fractional Laplacian regularization, respectively.

In the absence of noise, the reconstructions based on both regularizers are comparable. However, noiseless data does not depict a realistic situation [15]. In reality, the actual experimental data is always noisy due to the imperfections in the data acquisition process. We note that for noisy data, particularly for the fewer projection case with $N_{\theta}=10$ angles, fractional Laplacian regularization gives better reconstructions than the total variation regularization. This can be specifically seen in Fig. 2 (right panel, row 2) where finer features are better recovered e.g. the small circle at the bottom. However, to fully explore the potential of regularization technique, the well-known challenge is to find the appropriate regularization strength $\lambda$ to optimally balance the trade-off between data misfit and prior knowledge enforcement. In the case of fractional Laplacian regularization, the exponent $`s$ ’ only complicates the parameter choice further.

For the reconstructions in Fig. 2, given a wide range of values for $\lambda\in[1\times 10^{-18},10]$ , we arbitrarily fix $s=0.4$ , and solve the minimization problem Eq. 2.2 using an inexact truncated-Newton method for bound-constrained problems [37]. The optimal value of $\lambda$ is then chosen using a combination of L-curve criterion [22] and the lowest $\ell_{2}$ -norm of the reconstruction error compared to the ground truth. When L-curve criterion fails, we solely rely on the lowest $\ell_{2}$ -norm. In our experience, this behavior is true for both TV and fractional Laplacian. As a result, the optimal values of $\lambda$ for these tests is found to be in the range $[1\times 10^{-10},1]$ . This procedure of finding an optimal $\lambda$ is labor-intensive, and requires access to the true solution, which is not available in practice. We remark that, to our experience, L-curve is efficient (not necessarily optimal) only in the case of strongly convex regularization which is definitely not the case with fractional Laplacian when $`s$ ’ is also considered as a regularization parameter (non-convex with respect to $`s$ ’). L-curve criterion requires many different trial values of $\lambda$ , along with a good guess of the interval to locate the corner of the L-curve. This requires a lot of human-intervention and fine-tuning. Furthermore, the regularized solution obtained by the $\lambda$ predicted by L-curve sometimes fails to converge to the true solution [47].

The next section addresses the issue of finding the optimal regularization parameters by proposing a deep bilevel optimization neural network, where the choice of regularizer is flexible.

3 Parameter Learning via Bilevel Optimization Neural Network .

Parameter search lies at the core of optimization. In particular, we seek parameters corresponding to the strength of regularization, which is a persistent challenge in the scientific community. To this end, we introduce a learning based approach as adverted in Section 1. We first state a generic bilevel optimization problem,

[TABLE]

where $\mathcal{M}_{ad}$ is a closed convex and nonempty admissible set for $\mu$ .

In Section 3.1, motivated by [19], we present a machine-learning based approach to learn the regularization strength for a generic choice of regularizer. One of the key novelty of this paper is to use fractional Laplacian as the regularizer. Notice that the lower level problem Eq. 2.2 in Eq. 3.1 can be solved using existing techniques.

3.1 Bilevel Optimization Neural Network (BONNet).

Recently, deep residual learning has received a tremendous amount attention in machine learning for its immense potential to overcome the challenges faced by the traditional deep learning architectures, such as training complexity and vanishing gradients. These are resolved by adding skip connections, which transfer information between the layers [23]. Deep residual learning has enabled remarkable progress in imaging science [23, 49, 27], biomedical applications [32, 12, 19], satellite imagery, remote sensing [45, 52, 8] etc.. In our work, we use the power of deep learning to learn the regularization parameter $\mu$ which, for instance, contains the strength $\lambda$ and the fractional exponent $`s$ ’. We propose a dedicated deep bilevel optimization neural network to learn the regularization parameters. Our goal is to solve Eq. 3.1 for which we seek our modeling inspiration from [19], and define $\phi(\mu)$ as the average mean squared error over $m$ distinct samples, i.e.

[TABLE]

where $u(\mu)$ solves the lower level problem in (3.1), and corresponds to the sample characteristic that we wish to recover or reconstruct. Moreover, $u_{true}$ , as the name suggests, is the known true solution.

We emphasize a few novelties of this work: first, our proposed network works directly on the data space, as opposed to the image space as a post-processing step as in [41, 27]. Second, it generalizes to any bounded linear operator $K$ (the forward map; which defines the physics of the underlying system) and any $\mathcal{R}(u,\mu)$ (the regularization function; which allows us to incorporate the domain-specific knowledge of the solution). Third, we propose the use of fractional Laplacian as a regularizer with tunable regularity/smoothness. We also show how to integrate this choice of regularization into the BONNet architecture. We remark that fractional Laplacian introduces nonlocality in BONNet, which is challenging from both analytical and computational point of view.

We first define the notion of a generalized regularizer and the projection map that we will be using to define the BONNet architecture.

•

Generalized Regularizer. Let $u(\mu)$ be the solution of the inner problem in Eq. 3.1 which depends on $\mu$ . Let $T\mathrel{\mathop{\mathchar 58\relax}}=T(\mu,u(\mu))$ be the action of some linear or nonlinear operator acting on $u(\mu)$ , and $\sigma\mathrel{\mathop{\mathchar 58\relax}}=\sigma(T)$ be the activation function which acts pointwise on its argument. Then, we define a generalized regularizer as,

[TABLE]

Notice that in the case of Eq. 2.3, $\mu=\lambda$ , $T(\mu,u(\mu))=2\lambda|\nabla u|$ , and $\sigma(T)=\sqrt{T}$ . In the case of Eq. 2.5, $\mu=(\lambda,s)$ , $T(\mu,u(\mu))=\sqrt{\lambda}(-\Delta)^{\frac{s}{2}}u$ , and $\sigma(T)=T$ . Then, for $m$ distinct samples, we can write our inner minimization problem Eq. 2.2 with a generalized regularizer as an average over $m$ samples,

[TABLE]

To solve this inverse problem, we will employ derivative based methods such as projected gradient descent. The directional derivative of $J$ in a direction $h$ in Eq. 3.3 w.r.t $u$ in its variational form is; for each sample, $i=1,...,m,$

[TABLE]

Remark \thetheorem (Regularization vs. Activation Function).

We notice a strong connection between the regularization function and the activation function used in machine learning architectures which is governed by Eq. 3.2. For instance, once we decide upon a choice of regularizer, the activation function is dictated by that choice. On the other hand, if we choose an activation function first, then the regularization is dictated by that choice. Thus, due to these parallels, regularization in a model seems to be similar in spirit to an activation function. * *

•

Solver: Projected Gradient Descent Method. The choices of $X_{ad}$ and $\mathcal{M}_{ad}$ are problem dependent, for example, for tomographic reconstruction model, we let $X_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\{u\in X\ |\ u\geq 0\}$ . Moreover, we set $\mathcal{M}_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\Lambda_{ad}$ for total variation and $\mathcal{M}_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\Lambda_{ad}\times S_{ad}$ where $\Lambda_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\{\lambda\in\mathbb{R}\ |\ \lambda\geq\epsilon_{1}>0\}$ and $S_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\{s\in\mathbb{R}\ |\ 0<\epsilon_{2}\leq s\leq 1-\epsilon_{2}\}$ for the fractional Laplacian. See Section 4.1.2 for more details on this application. In order to satisfy these constraints, we use the projected gradient descent method with line search [29] to solve our inner and outer minimization problems in Eq. 3.1. Then, the projected gradient descent scheme for solving (3.3), for a fixed $\mu$ , $n$ iterations (depth of the network), $\alpha$ as the line search parameter (i.e.* the learning rate*), $u_{0}$ as the initial guess, for the network layers (optimization iteration) $j=1,...,n$ , is given by

[TABLE]

where $P_{X_{ad}}(\cdot)$ denotes the projection on the admissible set $X_{ad}$ , see Section 4.1.2 for more details on the tomographic reconstruction application. Note that, Eq. 3.5 is also known as the forward propagation. We are using $\nabla$ to denote the gradient and $D$ to denote the directional derivative (cf. Eq. 3.4). Now substitute the gradient from Eq. 3.4 in (3.5) to arrive at,

[TABLE]

To compute the learning rate $\alpha$ , we use line search for projected gradient descent as described in [29, pg. 91].

Putting it all together, we now describe our proposed BONNet architecture. Suppose we have $m$ distinct samples, and $n$ layers in our network. Let $u_{true}^{(i)}$ and $f^{(i)}$ be the known true solution and its corresponding experimental data for the $ith$ sample, with $i=1,...,m$ . Then, we formulate our bilevel supervised learning problem as; for $j=1,...,n$ ,

[TABLE]

To solve the outer level problem for $\mu\in\mathcal{M}_{ad}$ we again use the projected gradient descent method, as described above, with learning rate $\beta$ and $q$ iterations,

[TABLE]

where $P_{\mathcal{M}_{ad}}(\cdot)$ is the projection onto the admissible set. It then remains to evaluate $\nabla_{\mu_{l}}\phi(\mu_{l})$ . After applying the chain rule, we obtain that

[TABLE]

As noted earlier, the most challenging part of this network is the computation of sensitivity of $u$ w.r.t. $\mu$ , because at each network layer, $u$ depends on the previous iterate, as well as $\mu$ , as can be seen in the lower level problem in Eq. 3.7. We evaluate $\left.\frac{du_{n}^{(i)}}{d\mu}\right|_{\mu=\mu_{l}}$ in Eq. 3.9 by implicit differentiation. This results in an iterative system of equation that we need to solve. For each sample index $`i$ ’, it is explicitly derived as follows, for $j=1,...,n$

[TABLE]

where,

[TABLE]

and,

[TABLE]

Substituting Eq. 3.11 and Eq. 3.12 in Eq. 3.10 yields the sensitivity of $u$ w.r.t. $\mu$ . Now that we have the key architecture of the deep BONNet, we divide our network into a training phase and a testing phase, as is common in a standard machine learning framework. During the training phase, we solve the bilevel optimization problem Eq. 3.7 to learn the regularization parameters, and during the testing phase we only solve the inner problem in Eq. 3.7 using the regularization parameters learned from the training phase. The training phase can be carried out offline (i.e. in advance), and testing phase can be carried out online (i.e. as the experimental data becomes available).

3.1.1 General Framework of BONNet.

We summarize the training and testing phases of our deep BONNet architecture as follows:

•

Training Phase (Algorithm 1). In this phase, we pass in $m$ training samples $\left\{u_{true}^{(i)},f^{(i)}\right\}_{i=1}^{m}$ to learn the optimal $\mu$ which we denote by $\mu^{*}$ . The depth of the deep BONNet at the training phase is $`q$ sets of $n$ layers’. This phase can be carried out offline.

•

Testing Phase (Algorithm 2). In this phase, we use the $\mu^{*}$ learned from the training phase and testing data $\left\{f^{(i)}_{test}\right\}_{i=1}^{m_{test}}$ to Algorithm 2. The depth of the network at the testing phase is $n_{test}$ layers. This phase can be carried out online, once the experimental data $f_{test}$ becomes available.

Remark \thetheorem (Fixed vs. Variable Depth of BONNet.).

We remark that instead of specifying the number of layers when solving Eq. 3.8 or Eq. 3.6, one could also specify a stopping criterion appropriate for the solver being used, which is our recommendation as well. This is more in the spirit of solving an optimization problem which converges to a solution. This implies that the layers of the deep BONNet, in this case, will be variable. In our numerical experiments, we have used the stopping criterion for projected gradient descent method as mentioned in [29, pg. 91] for both $\mu$ and $u$ . Also note that for Eq. 3.6, the number of layers in the testing phase ( $n_{test}$ ) does not have to be equal to the number of layers in the training phase ( $n$ ). In fact, $n<<n_{test}$ prevents the network from overfitting of parameters to the training data. Furthermore, reconstruction at the testing phase can be progressively improved for structural fidelity, if needed, by using a larger $n_{test}$ (or a stricter stopping criterion). This allows for a trade-off between the quality of reconstruction and computational time. * *

3.1.2 BONNet Framework for Fractional Laplacian and Total Variation Regularization.

In the general framework of our proposed deep BONNet, for any bounded linear operator $K$ , any choice of regularizer can be incorporated, as long as it is cast into the generalized regularizer framework Eq. 3.2. In Section 2, we have proposed the use of fractional Laplacian as a regularizer, and have compared it with total variation regularization. We now show how to incorporate these regularizers into the deep BONNet, for a general $K$ :

(a)

Fractional Laplacian Regularization. Recall the fractional Laplacian regularization from Eq. 2.5,

[TABLE]

where $\mu=(\lambda,s)$ and $s\in(0,1)$ . Then, to define the corresponding generalized regularizer Eq. 3.2, let $T(\mu,u(\mu))\mathrel{\mathop{\mathchar 58\relax}}=\sqrt{\lambda}(-\Delta)^{\frac{s}{2}}u$ , and the activation function $\sigma(T)\mathrel{\mathop{\mathchar 58\relax}}=T$ . We omit the superscript $`i$ ’ to improve readability. Then, after some simplifications, Eq. 3.7, Eq. 3.11, and Eq. 3.12 become, for $j=1,...,n$ ,

[TABLE]

and

[TABLE]

which together give us the sensitivity of $u$ w.r.t. $\mu$ in Eq. 3.10. Notice that the second equation in (3.13) requires the sensitivity of fractional Laplacian $(-\Delta)^{s}$ with respect to $`s$ ’. This is a highly delicate object to handle. We shall reserve further details on this topic until the next section.

(b)

Total Variation Regularization. Recall the total variation regularization

[TABLE]

where $\mu=\lambda$ , and we are using the “regularized” total variation semi-norm,

[TABLE]

with $0<\xi\ll 1$ . We will omit the subscript $\xi$ from $\text{TV}_{\xi}$ for brevity. Then, to define the corresponding generalized regularizer Eq. 3.2, let $T(\mu,u(\mu))\mathrel{\mathop{\mathchar 58\relax}}=2\lambda\text{TV}(u)$ , and the activation function $\sigma(T)\mathrel{\mathop{\mathchar 58\relax}}=\sqrt{T}$ . Then, after some simplifications, Eq. 3.7, Eq. 3.11, and Eq. 3.12 become, for $j=1,...,n$ ,

[TABLE]

and

[TABLE]

which together give us the sensitivity of $u$ w.r.t. $\mu$ in Eq. 3.10. Again, we have omitted the superscript $`i$ ’ to improve readability.

4 Numerical Experiments of Tomographic Reconstruction.

In this section, we present several numerical experiments where we apply our proposed BONNet to a tomographic reconstruction problem. We have introduced tomographic reconstruction in Section 2.3. We demonstrate the results of BONNet with two regularizers, namely, the total variation and the proposed fractional Laplacian Eq. 2.5.

All the computations are carried out using MATLAB R2015b on a Laptop with Intel Core i7-8550U Processor, with NVIDIA GeForce MX150 with 2 GB RAM. In view of Section 3.1.1, we run the proposed algorithm until a desired tolerance (tol) is met. At the testing phase we set tol $=1\times 10^{-5}$ and at the training phase we set tol $=1\times 10^{-3}$ . Notice that the former is stricter than latter to avoid overfitting.

For all the total variation experiments we set the regularization parameter $\xi$ in Eq. 3.14 as $\xi=1\times 10^{-5}$ . Moreover, the last term in Eq. 3.15 is simply approximated by discrete Laplacian.

The remainder of the section is organized as follows. First in Section 4.1 we discuss the implementation details of fractional Laplacian and the admissible sets $X_{ad}$ and $\mathcal{M}_{ad}$ . This is followed by two experiments in Section 4.2.

4.1 Preliminaries.

Before we discuss the actual results, we state some preliminary material. As mentioned in the paragraph following Eq. 2.7, we discretize $\Omega$ as $N\times N$ uniform pixels. Then given $N_{\theta}$ number of angles and $N_{\tau}$ number of discrete beamlets, our goal is to recover $u\in\mathbb{R}^{N^{2}}$ . We also recall that the discrete form of operator $K$ is the matrix ${\bf K}=(k_{i,j})_{i,j=1}^{N_{\theta}N_{\tau},N^{2}}$ . All the integrals are computed using uniform quadrature and the differential operators are discretized using finite differences. We shall discuss the approximation of fractional Laplacian next.

4.1.1 Numerical Approximation of Fractional Laplacian.

In order to approximate the fractional Laplacian, we first discretize the Laplacian $(-\Delta)$ on a uniform stencil. We denote the resulting discrete matrix by $\mathbf{A}$ . If the eigen-decomposition of $\mathbf{A}$ is

[TABLE]

where $\mathbf{D}=(d_{i,j})_{i,j=1}^{N^{2},N^{2}}$ with $d_{i,j}=0$ if $i\neq j$ , and $d_{i,i}=\zeta_{i}$ denotes the eigenvalues with columns of $\mathbf{V}$ containing the corresponding eigenvectors. Then the fractional power of $\mathbf{A}$ is given by,

[TABLE]

where $\mathbf{G}(s)=(g_{i,j}(s))_{i,j=1}^{N^{2},N^{2}}$ is the diagonal matrix with $g_{i,j}(s)=0$ if $i\neq j$ and $g_{i,i}(s)=\zeta_{i}^{s}$ . From Eq. 3.13 we also recall that we need to approximate the variation of ${\bf A}^{s}$ with respect to $`s$ ’. A straightforward calculation gives

[TABLE]

where $\mathbf{H}(s)=(h_{i,j}(s))_{i,j=1}^{N^{2},N^{2}}$ is the diagonal matrix with $h_{i,j}(s)=0$ if $i\neq j$ and $h_{i,i}(s)=\zeta_{i}^{s}\ln(\zeta_{i})$ .

4.1.2 Admissible Sets and Projection.

For tomographic reconstruction we let $X_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\{u\in X\ |\ u\geq 0\}$ . Moreover, we set $\mathcal{M}_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\Lambda_{ad}$ for total variation and $\mathcal{M}_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\Lambda_{ad}\times S_{ad}$ where $\Lambda_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\{\lambda\in\mathbb{R}\ |\ \lambda\geq\epsilon_{1}>0\}$ and $S_{ad}\mathrel{\mathop{\mathchar 58\relax}}=\{s\in\mathbb{R}\ |\ 0<\epsilon_{2}\leq s\leq 1-\epsilon_{2}\}$ . We let $\epsilon_{1}=\epsilon_{2}=10^{-15}$ .

Furthermore, the projection in Eq. 3.6 onto the admissible set $X_{ad}$ is given by, for $z\in X$ ,

[TABLE]

Formally, the “derivative” of this map is given by

[TABLE]

For a rigorous definition of the generalized derivative of the max function, see [14]. Similar projection formulas are applicable for projection onto the set $\mathcal{M}_{ad}$ .

4.2 Experiments.

We begin by generating the synthetic data. We create $30$ distinct $64\times 64$ samples (i.e. $N=64$ ), which are variations of the Shepp-Logan Phantom (see Fig. 3 for two representative samples). Given $N_{\theta}$ and $N_{\tau}$ , we then simulate their corresponding sinograms $f$ based on standard discrete Radon transform [6]. Next we add $0.1\%$ Gaussian noise to each sinogram, respectively. This gives us our synthetic data, which we divide into $m=20$ training samples and $m_{test}=10$ testing samples.

We remark that in tomography, the number of projection angles, $N_{\theta}$ , is important, since it determines the amount of X-ray the sample is exposed to. We emphasize that the most challenging, yet common, cases in tomographic reconstruction are the ones with smaller $N_{\theta}$ , due to the limits on X-ray exposure. We conduct numerical experiments for tomographic scans obtained for various $N_{\theta}$ . For each choice, the selected number of angles are uniformly distributed in the range $[0,180]$ . Note that, for each choice of $N_{\theta}$ , a separate set of projection data is generated (for a batch of 30 samples), on which the learning and reconstructions are performed using our deep BONNet as discussed in Algorithm 1 and Algorithm 2.

We have undertaken two sets of experiments. In the first experiment, we fix $s=0.4$ and learn $\lambda$ . In the second experiment, we learn $\mu=(\lambda,s)$ .

4.2.1 Results of Experiment I: Learning $\lambda$ , fixed $s=0.4$ .

We now discuss the results of our experiments. In Fig. 4, we compare the reconstructions obtained from BONNet with the true solution shown in Fig. 3.

The reconstructions are based on no regularization, total variation regularization, and the fractional Laplacian regularization for data with $0.1\%$ noise. The columns correspond to the number of projections angles used. We remark again that each choice of $N_{\theta}$ for a batch of training and testing data, corresponds to a distinct separate problem that we solve, as the dimensionality of ${\bf K}$ depends on $N_{\theta}$ . The left panel corresponds to the reconstruction of the training data at the $nth$ iterate. Recall that at the training phase, $\{(u_{true}^{(i)},f_{train}^{(i)})\}_{i=1}^{m=20}$ are passed to the deep BONNet Algorithm 1. The $\lambda$ values mentioned under each reconstruction are the corresponding optimal $\lambda_{none}^{*},\lambda_{TV}^{*},$ and $\lambda_{fracLap}^{*}$ that we learn during the training stage. Notice that $\lambda_{none}^{*}=0$ corresponds to no regularization. The right panel corresponds to the reconstructions at the $n_{test}th$ layer of the testing phase. Recall that $\{(\lambda^{*},f_{test}^{(i)})\}_{i=1}^{m_{test}=10}$ are passed to the deep BONNet at this stage of Algorithm 2.

From the reconstructions in Fig. 4, we observe that for the tomographic reconstruction problem, first of all, regularization is improving the quality of reconstructions. In the absence of regularization, the high intensity regions are preserved, but we lose information from regions of low intensity. On the other hand, TV and fractional Laplacian regularizations preserve the sample characteristics in the lower intensity regions of the sample. Fractional Laplacian gives reconstructions which are either better, or comparable to TV regularization. In addition, it does better at smoothing out the noise, and also in regaining comparatively more information in regions of low intensity, such as the dim circle on the lower side of the Phantom, e.g. for $N_{\theta}=10$ . This is especially important when we have limited data to reconstruct from. We also recall that the Euler-Lagrange equation corresponding to the fractional Laplacian regularization is linear, and that of TV is non-linear.

We also observe that for any given regularizer choice, the optimal $\lambda^{*}$ obtained for $N_{\theta}=10$ is similar to the one obtained for a larger $N_{\theta}$ . Thus, to learn the regularization strength, even limited tomographic scan data suffices, and the same $\lambda^{*}$ could be used for reconstruction at the testing phase for any amount of available data, which can significantly save the offline training time.

For the experimental cases mentioned above, we measure the quality of reconstructions using metrics such as the mean-squared error (MSE) Fig. 5, Peak signal-to-noise ratio (PSNR) Fig. 6, and structural similarity index (SSIM) Fig. 7, averaged over all the samples. For MSE, smaller values correspond to better results, and for PSNR and SSIM, larger values are better. Notice that for each metric, fractional Laplacian regularization outperforms the total variation regularization.

We remark that the $\lambda$ values that we learn via deep BONNet are similar to those obtained by using a combination of the lowest error norm and L-curve; however, the parameter search via BONNet is automated. The reconstructions obtained via Projected Gradient Descent are also similar to the ones obtained earlier Fig. 2 using the inexact truncated-Newton method for bound-constrained problem [37]. We emphasize that one may use a different solver during the testing stage once $\lambda^{*}$ is obtained via BONNet training.

4.2.2 Results of Experiment II: Learning $\lambda$ and Fractional Exponent $`s$ ’.

We now train BONNet to learn both the fractional exponent $`s$ ’ of the fractional Laplacian and the strength $\lambda$ . We use the BONNet architecture using fractional Laplacian discussed in (a) and use the same training and testing data as described in the previous example. In Table 1 we show comparisons of MSE, SSIM and PSNR for $N_{\theta}=\{10,20\}$ projection angles, respectively, for the reconstructions of the testing data. We compare the results with the fractional Laplacian case discussed in Section 4.2.1. In the case of $N_{\theta}=10$ , we obtain $(\lambda^{*}_{fracLap},s^{*}$ ) = (5.04417e-6, 0.5413) and in the case of $N_{\theta}=20$ , we obtain $(\lambda^{*}_{fracLap},s^{*})$ = (8.53717e-6, 0.3799). The reconstructions of $u$ with $(\lambda^{*}_{fracLap},s^{*})$ are visually comparable to the case of fractional Lapalcian in Fig. 4 and therefore they have been omitted. We observe that all the error metrics returned by BONNet are either comparable, or slightly better, than the ones obtained by BONNet for a fixed $`s$ ’, discussed in Section 4.2.1. The advantage now is that we no longer need to tune the parameters manually.

5 Discussion.

In this work, we consider a general regularized regression model for inverse problems. This model can incorporate the underlying physics (defined by the operator $K$ ), in addition to the prior knowledge of the solution in the regularization term. However, to fully explore the potential of this generalized model, an optimal choice of the type of regularizer, as well as the regularization strength, is inevitable.

We have used fractional Laplacian as a regularizer on tomographic reconstruction problems. Previously, this has been used in image denoising. The key benefit of using this regularization is that the corresponding Euler-Lagrange equations are linear, as opposed to the nonlinear and possibly degenerate Euler-Lagrange equations for the popular total variation regularization.

To address the challenge of finding the optimal regularization strength, we introduce a dedicated deep BONNet architecture to learn the regularization parameters for any choice of regularizer. We show an analogy of the regularization function to the activation function in a standard neural network, which provides a theoretical guidance in terms of choosing an optimal activation function. In addition to the regularization strength $\lambda$ , BONNet can also learn the exponent $`s$ ’ for the fractional Laplacian regularization.

Next, we demonstrate the benefit of our proposed deep BONNet on the tomographic reconstruction problem. We first conduct experiments to learn only $\lambda$ with a fixed $`s$ ’. We have observed that fractional Laplacian regularization gives comparable or better reconstructions compared to the total variation regularization. Especially for the noisy and limited data ( $N_{\theta}=10$ ), fractional Laplacian regularization outperforms the total variation regularization. In contrast to the standard machine learning architectures with fixed number of layers, our network favors a variable number of layers (depth) which is dictated by the convergence to the solution of the optimization problem. Thus, the number of layers in the network can be different for different samples and different regularizers. We also demonstrate the capability of our proposed BONNet in terms of learning the optimum $(\lambda^{*}_{fracLap},s^{*})$ pair for the fractional Laplacian regularizer, and this indicates the flexibility of our proposed network to learn non-standard parameters.

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Ambrosio, N. Fusco, and D. Pallara , Functions of Bounded Variation and Free Discontinuity Problems , Oxford Mathematical Monographs, The Clarendon Press, Oxford University Press, New York, 2000.
2[2] H. Antil and S. Bartels , Spectral approximation of fractional pdes in image processing and phase field modeling , Computational Methods in Applied Mathematics, 17 (2017), pp. 661–678, https://doi.org/10.1515/cmam-2017-0039 , https://www.degruyter.com/view/j/cmam.2017.17.issue-4/cmam-2017-0039/cmam-2017-0039.xml . · doi ↗
3[3] H. Antil, T. Berry, and J. Harlim , Fractional diffusion maps , ar Xiv preprint ar Xiv:1810.03952, (2018).
4[4] H. Antil, R. Khatri, and M. Warma , External optimal control of nonlocal pdes , Inverse Problems, to appear, (2019).
5[5] H. Antil and C. Rautenberg , Sobolev spaces with non-muckenhoupt weights, fractional elliptic operators, and applications , SIAM Journal on Mathematical Analysis, 51 (2019), pp. 2479–2503, https://doi.org/10.1137/18M 1224970 , https://doi.org/10.1137/18M 1224970 , https://arxiv.org/abs/https://doi.org/10.1137/18M 1224970 . · doi ↗
6[6] A. P. Austin, Z. Di, S. Leyffer, and S. M. Wild , Simultaneous sensing error recovery and tomographic inversion using an optimization-based approach , SIAM Journal on Scientific Computing, 41 (2019), pp. B 497–B 521.
7[7] S. Bartels and M. Milicevic , Alternating direction method of multipliers with variable step sizes , ar Xiv preprint ar Xiv:1704.06069, (2017).
8[8] B. Bischke, P. Bhardwaj, A. Gautam, P. Helber, D. Borth, and A. Dengel , Detection of flooding events in social multimedia and satellite imagery using deep neural networks , in Working Notes Proceedings of the Media Eval 2017. Media Eval Benchmark, September 13-15, Dublin, Ireland, Media Eval, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Bilevel Optimization, Deep

Abstract

keywords:

1 Introduction.

2 Regularization in Inverse Problems.

2.1 Total Variation Regularization.

2.2 Fractional Laplacian Regularization.

2.3 Tomographic Reconstruction.

2.4 Comparison of Fractional Laplacian with TV for Tomographic Reconstruction.

3 Parameter Learning via Bilevel Optimization Neural Network .

3.1 Bilevel Optimization Neural Network (BONNet).

Remark \thetheorem (Regularization vs. Activation Function).

3.1.1 General Framework of BONNet.

Remark \thetheorem (Fixed vs. Variable Depth of BONNet.).

3.1.2 BONNet Framework for Fractional Laplacian and Total Variation Regularization.

4 Numerical Experiments of Tomographic Reconstruction.

4.1 Preliminaries.

4.1.1 Numerical Approximation of Fractional Laplacian.

4.1.2 Admissible Sets and Projection.

4.2 Experiments.

4.2.1 Results of Experiment I: Learning λ\lambdaλ, fixed s=0.4s=0.4s=0.4.

4.2.2 Results of Experiment II: Learning λ\lambdaλ and Fractional Exponent ‘s`s‘s’.

5 Discussion.

4.2.1 Results of Experiment I: Learning $\lambda$ , fixed $s=0.4$ .

4.2.2 Results of Experiment II: Learning $\lambda$ and Fractional Exponent $`s$ ’.