Bayesian inference and uncertainty quantification for image   reconstruction with Poisson data

Qingping Zhou; Tengchao Yu; Xiaoqun Zhang; Jinglai Li

arXiv:1903.02075·math.NA·October 22, 2019

Bayesian inference and uncertainty quantification for image reconstruction with Poisson data

Qingping Zhou, Tengchao Yu, Xiaoqun Zhang, Jinglai Li

PDF

Open Access

TL;DR

This paper develops a comprehensive Bayesian framework for image reconstruction from Poisson data, including a positivity-preserving reparametrization, a dimension-independent MCMC algorithm, and methods for regularization parameter estimation and artifact detection.

Contribution

It introduces a novel infinite-dimensional Bayesian approach with a hybrid prior, a new MCMC algorithm, and techniques for regularization and artifact detection in image reconstruction.

Findings

01

The posterior distribution is well-posed in infinite dimensions.

02

The dimension-independent MCMC algorithm effectively samples the posterior.

03

The method successfully detects artifacts in reconstructed images.

Abstract

We provide a complete framework for performing infinite-dimensional Bayesian inference and uncertainty quantification for image reconstruction with Poisson data. In particular, we address the following issues to make the Bayesian framework applicable in practice. We first introduce a positivity-preserving reparametrization, and we prove that under the reparametrization and a hybrid prior, the posterior distribution is well-posed in the infinite dimensional setting. Second we provide a dimension-independent MCMC algorithm, based on the preconditioned Crank-Nicolson Langevin method, in which we use a primal-dual scheme to compute the offset direction. Third we give a method combining the model discrepancy method and maximum likelihood estimation to determine the regularization parameter in the hybrid prior. Finally we propose to use the obtained posterior distribution to detect artifacts…

Tables2

Table 1. Table 1 : (K=0.5) The posterior predictive p 𝑝 p -value ( p b subscript 𝑝 𝑏 p_{b} ) and the PSNR of the resulting posterior mean for different values of λ 𝜆 \lambda .

$λ$	0	1	2	3	4	5
$p_{b}$	0.99	0.74	0.32	0.08	0.0074	0.0004
PSNR	15.69	18.85	20.31	20.04	18.79	18.58

Table 2. Table 2 : (K=1) The posterior predictive p 𝑝 p -value ( p b subscript 𝑝 𝑏 p_{b} ) and the PSNR of the resulting posterior mean for different values of λ 𝜆 \lambda .

$λ$	0	0.5	1	2	3	4
$p_{b}$	0.99	0.82	0.28	0.04	0.0034	0.0005
PSNR	18.21	21.90	21.99	20.66	19.76	19.27

Equations143

\frac{d μ ^{y}}{d μ _{pr}} (u) = π (y ∣ u),

\frac{d μ ^{y}}{d μ _{pr}} (u) = π (y ∣ u),

θ = \mbox A u .

θ = \mbox A u .

θ_{i} = (\mbox A u)_{i} = K \int_{L_{i}} u (x) ∣ d x ∣,

π_{P} (y ∣ θ) = i = 1 \prod d \frac{( θ _{i} ) ^{y_{i}} exp ( - θ _{i} )}{y _{i} !} .

π_{P} (y ∣ θ) = i = 1 \prod d \frac{( θ _{i} ) ^{y_{i}} exp ( - θ _{i} )}{y _{i} !} .

u (x) = f (z (x)) = \frac{a}{2} (erf (z (x)) + b),

u (x) = f (z (x)) = \frac{a}{2} (erf (z (x)) + b),

u (x) = f (z (x)) = \frac{a}{2} (erf (z (x) / c) + b),

u (x) = f (z (x)) = \frac{a}{2} (erf (z (x) / c) + b),

erf (z) = \frac{2}{π} \int_{0}^{z} e^{- t^{2}} d t .

erf (z) = \frac{2}{π} \int_{0}^{z} e^{- t^{2}} d t .

a (b - 1) \leq u (x) \leq a (b + 1) .

a (b - 1) \leq u (x) \leq a (b + 1) .

π (y ∣ z) = π_{P} (y ∣ θ = \mbox A f (z))

π (y ∣ z) = π_{P} (y ∣ θ = \mbox A f (z))

π (y ∣ z) \propto exp (- Φ (z, y)),

π (y ∣ z) \propto exp (- Φ (z, y)),

Φ (z; y) = i = 1 \sum d (\mbox A f (z))_{i} - y_{i} ln (\mbox A f (z))_{i} .

Φ (z; y) = ⟨ \mbox A f (z), 1 ⟩ - ⟨ y, ln (\mbox A f (z))⟩ = ⟨ θ, 1 ⟩ - ⟨ y, ln θ ⟩,

Φ (z; y) = ⟨ \mbox A f (z), 1 ⟩ - ⟨ y, ln (\mbox A f (z))⟩ = ⟨ θ, 1 ⟩ - ⟨ y, ln θ ⟩,

X = H^{1} (Ω) = {z (x) \in L_{2} (Ω) ∣ \partial_{x_{1}} z, \partial_{x_{2}} z \in L_{2} (Ω) for all x = (x_{1}, x_{2}) \in Ω} .

X = H^{1} (Ω) = {z (x) \in L_{2} (Ω) ∣ \partial_{x_{1}} z, \partial_{x_{2}} z \in L_{2} (Ω) for all x = (x_{1}, x_{2}) \in Ω} .

∥ z ∥_{H^{1}}^{2} = ∥ z ∥_{L_{2} (Ω)}^{2} + ∥ \partial_{x_{1}} z ∥_{L_{2} (Ω)}^{2} + ∥ \partial_{x_{2}} z ∥_{L_{2} (Ω)}^{2} .

∥ z ∥_{H^{1}}^{2} = ∥ z ∥_{L_{2} (Ω)}^{2} + ∥ \partial_{x_{1}} z ∥_{L_{2} (Ω)}^{2} + ∥ \partial_{x_{2}} z ∥_{L_{2} (Ω)}^{2} .

\frac{d μ _{pr}}{d μ _{0}} (z) \propto exp (- R (z)), R (z) = λ ∥ z ∥_{\textsc t v} .

\frac{d μ _{pr}}{d μ _{0}} (z) \propto exp (- R (z)), R (z) = λ ∥ z ∥_{\textsc t v} .

∥ z ∥_{TV} = \int_{Ω} ∥\nabla u ∥_{2} d x,

∥ z ∥_{TV} = \int_{Ω} ∥\nabla u ∥_{2} d x,

\frac{d μ ^{y}}{d μ _{0}} (z) \propto exp (- Φ (z) - R (z)),

\frac{d μ ^{y}}{d μ _{0}} (z) \propto exp (- Φ (z) - R (z)),

∥ z ∥_{TV} \geq \frac{a}{c} ∥ u ∥_{TV} .

∥ z ∥_{TV} \geq \frac{a}{c} ∥ u ∥_{TV} .

M \leq Φ (z) \leq N;

M \leq Φ (z) \leq N;

∣Φ (z) - Φ (v) ∣ \leq M ∥ z - v ∥_{X};

∣Φ (z) - Φ (v) ∣ \leq M ∥ z - v ∥_{X};

∣Φ (z; y) - Φ (z; y^{'}) ∣ \leq M ∥ y - y^{'} ∥_{2} .

∣Φ (z; y) - Φ (z; y^{'}) ∣ \leq M ∥ y - y^{'} ∥_{2} .

d_{Hell} (μ^{y}, μ^{y^{'}}) \leq C ∥ y - y^{'} ∥_{2} .

d_{Hell} (μ^{y}, μ^{y^{'}}) \leq C ∥ y - y^{'} ∥_{2} .

\frac{d μ _{N_{1}, N_{2}}^{y}}{d μ _{0}} = exp (- Φ_{N_{1}} (z) - R_{N_{2}} (z)),

\frac{d μ _{N_{1}, N_{2}}^{y}}{d μ _{0}} = exp (- Φ_{N_{1}} (z) - R_{N_{2}} (z)),

X_{ϵ} = {z \in X ∣ ∣Φ (z) - Φ_{N_{1}} (z) ∣ \leq a_{N_{1}} (ϵ), ∣ R (z) - R_{N_{2}} (z) ∣ \leq b_{N_{2}} (ϵ)} .

X_{ϵ} = {z \in X ∣ ∣Φ (z) - Φ_{N_{1}} (z) ∣ \leq a_{N_{1}} (ϵ), ∣ R (z) - R_{N_{2}} (z) ∣ \leq b_{N_{2}} (ϵ)} .

d_{Hell} (μ^{y}, μ_{N_{1}, N_{2}}^{y}) \to 0 as N_{1}, N_{2} \to + \infty.

d_{Hell} (μ^{y}, μ_{N_{1}, N_{2}}^{y}) \to 0 as N_{1}, N_{2} \to + \infty.

z (x) = i = 1 \sum N z_{i} η_{i} e_{i} (x),

z (x) = i = 1 \sum N z_{i} η_{i} e_{i} (x),

v = (1 - β^{2})^{\frac{1}{2}} z + β w,

v = (1 - β^{2})^{\frac{1}{2}} z + β w,

a (v, z) = min {1, exp [Ψ (z) - Ψ (v)]},

a (v, z) = min {1, exp [Ψ (z) - Ψ (v)]},

(2 + δ) v = (2 - δ) z - 2 δ C_{0} \mathpzc D Ψ (z) + 8 δ w,

(2 + δ) v = (2 - δ) z - 2 δ C_{0} \mathpzc D Ψ (z) + 8 δ w,

ρ (z, v) = Ψ (z) + \frac{1}{2} ⟨ v - z, \mathpzc D Ψ (z)⟩ + \frac{δ}{4} ⟨ z + v, \mathpzc D Ψ (z)⟩ + \frac{δ}{4} ∥ C_{0}^{1/2} \mathpzc D Ψ (z) ∥^{2},

ρ (z, v) = Ψ (z) + \frac{1}{2} ⟨ v - z, \mathpzc D Ψ (z)⟩ + \frac{δ}{4} ⟨ z + v, \mathpzc D Ψ (z)⟩ + \frac{δ}{4} ∥ C_{0}^{1/2} \mathpzc D Ψ (z) ∥^{2},

a (z, v) = min {1, exp (ρ (z, v) - ρ (v, z))} .

a (z, v) = min {1, exp (ρ (z, v) - ρ (v, z))} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Statistical Methods and Inference · Advanced MRI Techniques and Applications

Full text

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis

\newsiamthmclaimClaim \headersBayesian inference for Poisson dataQ. Zhou, T. Yu, X. Zhang and J. Li

Bayesian inference and uncertainty quantification for medical image reconstruction

with Poisson data††thanks: Submitted to the editors DATE. \fundingThis work was funded by the National Natural Science Foundation of China, under grant numbers 11771288 and 11771289. JL also wants to acknowledge the support of the Student Innovation Center at Shanghai Jiao Tong University.

Qingping Zhou School of Mathematical Sciences, Institute of Natural Sciences, Shanghai Jiao Tong University, 800 Dongchuan Rd, Shanghai 200240, China, ([email protected]).

Tengchao Yu School of Mathematical Sciences, Institute of Natural Sciences, Shanghai Jiao Tong University, 800 Dongchuan Rd, Shanghai 200240, China, ([email protected]).

Xiaoqun Zhang Institute of Natural Sciences, School of Mathematical Sciences, and the MOE Key Laboratory of Scientific and Engineering Computing, Shanghai Jiao Tong University, 800 Dongchuan Rd, Shanghai 200240, China, ([email protected])..

Jinglai Li Corresponding Author, Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, UK ([email protected]).

Abstract

We provide a complete framework for performing infinite-dimensional Bayesian inference and uncertainty quantification for image reconstruction with Poisson data. In particular, we address the following issues to make the Bayesian framework applicable in practice. We first introduce a positivity-preserving reparametrization, and we prove that under the reparametrization and a hybrid prior, the posterior distribution is well-posed in the infinite dimensional setting. Second we provide a dimension-independent MCMC algorithm, based on the preconditioned Crank-Nicolson Langevin method, in which we use a primal-dual scheme to compute the offset direction. Third we give a method combining the model discrepancy method and maximum likelihood estimation to determine the regularization parameter in the hybrid prior. Finally we propose to use the obtained posterior distribution to detect artifacts in a recovered image. We provide an example to demonstrate the effectiveness of the proposed method.

keywords:

Bayesian inference, image reconstruction, Markov chain Monte Carlo, Poisson distribution, Positron emission tomography, uncertainty quantification.

{AMS}

68Q25, 68R10, 68U05

1 Introduction

Image reconstruction involves constructing interpretable images of objects of interest from the data recorded by an imaging device [18]. Image reconstruction is usually cast as an inverse problem as one wants to determine the input to a system from the output of it. In most practical image reconstruction problems, the measurement and recording process is inevitably corrupted by noise, which renders the obtained data random. The statistical properties of the data have significant impact to the reconstruction results. In this work we shall focus on a special type of medical image reconstruction problems where the recorded data follows a Poisson distribution. The Poisson data usually arises in imaging problems where the unknown quantity of interest is an object which interacts with some known incident beam of photons or electrons [25]. A very important example of such problems is the Positron emission tomography(PET) [31, 4], a nuclear medicine imaging technique that is widely used in early detection and treatment follow up of many diseases, including cancer. In PET, the detection of signal is essential a photon counting process and as a result the data is well modeled by a Poisson distribution [4, 25]. The problem has attracted considerable research interests, and a number of methods have been developed to recover the image, e.g., [39, 42, 17], just to name a few.

On the other hand, the stochastic nature of the data also introduces uncertainty into the image reconstruction process, and as a result the image obtained is unavoidably subject to uncertainty. In practice, many important decisions such as diagnostics have to be made based on the images obtained. It is thus highly desirable to have methods that can not only compute the image but also quantify the uncertainty in the image obtained. To this end, the Bayesian inference method has become a popular tool for image reconstruction [27], largely thanks to its ability to quantify uncertainty in the obtained image. The Bayesian formulation has long been used to solve image reconstruction problems with Poisson data, e.g., [24, 29, 21]. We note, however, that most of the works in the early years focus on computing a point estimate, which is usually the maximum a posteriori (MAP) estimate in the Bayesian setting, because of the limited computational power available then. More recently, mounting interest has been directed to the computation of the complete posterior distribution, rather than a point estimate, of the image, for that it can provide the important uncertainty information of the reconstruction results. For example, a Markov chain Monte Carlo (MCMC) algorithm is developed to sample the posterior distribution of the image in [6], and a variational Gaussian approximation of the posterior is proposed in [3].

A serious challenge in the numerical implementation of the Bayesian image reconstruction is that in certain circumstances the inference results diverge with respect to resolution/discretization refinement, which is known as to be discretization variant or dimension dependent. To address the issue, Stuart [40] proposes an infinite dimensional framework, formulating the Bayesian inference problem in function spaces. Under the infinite dimensional framework, the inference results will converge with respect to discretization dimensionality, which is an important property for the numerical implementation. For example, it allows one to use multigrid strategy, e.g. [44, 32], to accelerate the sampling of the posterior. Building on several existing works, we aim to provide in this work a complete framework for performing infinite dimensional Bayesian inference and uncertainty quantification for medical image reconstruction with Poisson data, while providing treatments of several issues surrounding the problem. Specifically we summarize the key ingredients of our Bayesian framework as the following. First, in the usual setup, the function of interest can be both positive and negative valued. However, in the Poisson problem, when the function is negative valued, it may cause the Poisson likelihood function to be undefined (see Section 2.3 for more details), which renders the posterior distribution ill-posed in the infinite dimensional setting. To tackle the issue, we introduce a reparametrization of the unknown image which ensures that the function of interest is always positive valued. Moreover, medical images are often subject to sharp jumps and here we use the TV-Gaussian (TG) hybrid prior distribution proposed in [43] to model the jumps in the function/image. Using the positivity-preserving reparametrization and the TG prior, we are able to show that the resulting posterior distribution is well-posed in the infinite dimensional setting, which, to the best of our knowledge, has not yet been done for the Poisson data model. Second, we consider the numerical implementation of the Bayesian inference. A main difficulty here is that many standard MCMC algorithms such as the well-known Metropolis-Hastings (MH) [37, 9], degenerate with respect to resolution refinement. In [14] the authors introduce a MCMC algorithm termed as the preconditioned Crank-Nicolson (pCN) method, the performance of which is independent of discretization dimensionality. The authors also provide a Langevin variant of the pCN algorithm in [14] which accelerates the sampling procedure by incorporating the local gradient information of the likelihood function. In our problem, the pCN-Langevin (pCNL) algorithm can not be used directly because the prior used here has the total variation (TV) term which can be non-differentiable. To overcome this difficulty we modify the pCNL method by replacing the gradient direction with one computed by the primal-dual algorithm. We note that a similar problem is considered in [34, 15] where a proximal method is used to approximate the gradient direction. Other than that the directions are computed with different approaches, another main difference between the aforementioned works and the present one is that we use the pCN framework here so the algorithm is dimension independent, while the works [34, 15] concern finite dimensional problems where discretization refinement is not an issue. Third, an important issue in the TG hybrid prior is to determine the value of the regularization parameter of the TV term. In the Bayesian framework, such parameters are often determined with the hierarchical Bayes or the empirical Bayes method [20]. As discussed in Section 4, these methods, however, are computationally intractable in our problem as we do not know the normalization constant of the TG prior. Thus, in this work we provide a method to determine the value of the TV regularization parameter by combining the realized discrepancy model fit assessment approach developed in [19] and the stochastic proximal gradient method developed in [16]. Finally, we provide an application of the uncertainty information obtained in the Bayesian framework, where we use the posterior distribution to detect possible artifacts in any reconstructed image.

The rest of the paper is organized as the following. In Section 2, we present the infinite dimensional Bayesian formulation of the image reconstruction problem with Poisson data, and we prove that under the reparametrization the resulting posterior is well-posed in the function space. In Section 3, we describe the primal-dual pCN algorithm to sample the posterior distribution of the present problem. Section 4 provides a method to determine the value of the regularization parameter in the TG prior. Section 5 discusses how to use the posterior distribution to detect artifacts in a reconstructed image. Finally numerical experiments of the proposed Bayesian framework are performed in Section 6.

2 Infinite dimensional Bayesian image reconstruction with Poisson data

In this section, we formulate the image reconstruction with Poisson data in an infinite dimensional Bayesian framework.

2.1 The Bayesian inference formulation for functions

We start by presenting a generic Bayesian inference problem for functions. Let $X$ be a separable Hilbert space of functions with inner product $\langle\cdot,\cdot\rangle_{X}$ . Our goal is to infer $u\in X$ from data $y\in Y\subset R^{d}$ and $\mathbf{y}$ is related to $u$ via the likelihood function $\pi(\mathbf{y}|u)$ , i.e., the distribution of $\mathbf{y}$ conditional on the value of $u$ . In the Bayesian setting, we first assume a prior distribution $\mu_{\mathrm{pr}}$ of the unknown $u$ , which represents one’s prior knowledge on the unknown. In principle $\mu_{\mathrm{pr}}$ can be any probabilistic measure defined on the space $X$ . The posterior measure $\mu^{y}$ of $u$ conditional on data $y$ is provided by the Radon-Nikodym(R-N) derivative:

[TABLE]

which can be interpreted as the Bayes’ rule in the infinite dimensional setting. The posterior distribution $\mu^{y}$ thus can be computed from Eq. (1) with, for example, a MCMC simulation.

2.2 Poisson data model and the positivity-preserving reparametrization

To perform the Bayesian inference, we first need to specify the likelihood function, which can be derived from the underlying mathematical model relating the data and to the unknown image. We assume that the image is first projected to the noise-free observable via a mapping $A:X\rightarrow Y$ ,

[TABLE]

for $i=1...d$ , where $K$ is a positive constant describing the noise level. Poisson noise is then applied to the projected observable $\bm{\theta}$ , yielding the likelihood function $\pi(\mathbf{y}|u)=\pi_{\mathrm{P}}(\mathbf{y}|\bm{\theta}={\mbox{A}}u)$ , where $\pi_{\mathrm{P}}(\mathbf{y}|\bm{\theta})$ is the $d$ -dimensional Poisson distribution:

[TABLE]

In the PET problem, there is an additional restriction: the unknown function $u$ must be positive. The reason is two-fold: first from the physical point of view, the unknown $u$ represents the density of the medium, which is positive; from a technical point of view, if $u$ is not constrained to be positive, it may yield some negative components of the predicted data $\bm{\theta}$ , which renders the Poisson likelihood un-defined. To this end, we need to introduce a transformation to preserve positivity of the unknown $u$ . To impose the positivity constraint, we reparameterize the unknown $u$ as:

[TABLE]

where $a$ and $b$ are two constants satisfying $a>0$ and $b>1$ , and ${\mathrm{erf}}(\cdot)$ is the error function defined as:

[TABLE]

where $a$ $b$ and $c$ are constants satisfying $a>0$ , $b>1$ , and $c>0$ , and ${\mathrm{erf}}(\cdot)$ is the error function defined as:

[TABLE]

With the new parametrization, it is easy to see that for any $\mathbf{x}\in\Omega$ , we have

[TABLE]

Moreover, as the behavior of the error function is well understood (for example its derivative is simply the Gaussian distribution), which allows us determine the parameters conveniently. That said, it is worth noting here that the methods presented here does not rely on this specific reparametrization formulation. Now we can infer the new unknown $z$ and once $z$ is known $u$ can be computed accordingly. In this setup, the likelihood function for $z$ becomes

[TABLE]

where $\pi_{P}(\mathbf{y}|\bm{\theta})$ is the $d$ -dimensional Poisson distribution given by Eq. (3). Following [40], we can write the likelihood function $\pi(\mathbf{y}|z)$ in the form of

[TABLE]

For simplicity we can rewrite Eq. (5b) as,

[TABLE]

where $\bm{1}$ is a $d$ -dimensional vector whose components are all one, $\bm{\theta}={\mbox{A}}f(z)$ is the predicted observable, and $\langle\cdot,\cdot\rangle$ denotes the Euclidean inner product. In what follows we often omit the argument $y$ and simply use $\Phi(z)$ , when not causing ambiguity. This notation will be used often later.

2.3 Bayesian framework with the hybrid prior for PET imaging

We now describe how the infinite dimensional Bayesian inference framework is applied to the PET problem. First we assume that the unknown function $z$ is a function defined on $\Omega$ , a bounded open subset of $R^{2}$ . In particular, we set the state space $X$ to be the Sobolev space $H^{1}(\Omega)$ :

[TABLE]

The associated norm $\|\cdot\|_{X}=\|\cdot\|_{H^{1}}$ is

[TABLE]

Choosing a good prior distribution is one of the most important issues in Bayesian inference. Conventionally one often assumes that the prior on $z$ , is a Gaussian measure defined on $X$ with mean $\xi$ covariance operator $C_{0}$ , i.e. $\mu_{\mathrm{pr}}=N(\xi,C_{0})$ . Note that $C_{0}$ is symmetric positive and of trace class. The Gaussian prior has many theoretical and practical advantages, but a major limitation of the Gaussian prior is that it can not model functions with sharp jumps well.

To address the issue, here we use the TV-Gaussian prior proposed in [43]:

[TABLE]

where $\mu_{0}=N(\xi,C_{0})$ is the Gaussian reference prior defined on $X$ with mean $\xi$ and covariance $C_{0}$ and $\|\cdot\|_{\mathrm{TV}}$ is the TV seminorm,

[TABLE]

and $\lambda$ is a positive constant. It follows immediately that the R-N derivative of $\mu^{y}$ with respect to $\mu_{0}$ is

[TABLE]

which returns to the conventional formulation of inference with a Gaussian prior. Thus all the methods developed for inference problems with Gaussian priors can be directly applied to our formulation. We note that it is natural to directly apply the TV seminorm to the original image $u$ ; if we do so, however, Proposition 2.1 may no longer hold. For this technical reason we here choose to impose the TV seminorm on the new variable $z$ . Nonetheless, it can be showed that

[TABLE]

Next we shall show that the formulated Bayesian inference problem is well defined in the infinite dimensional setting. We first show that $\Phi(z)$ given by Eq. (6) satisfies certain important conditions, as is stated by Proposition 2.1.

Proposition 2.1.

The functional $\Phi$ given in Eq. (6) has the following properties:

For every $r>0$ , there are constants $M(r)\in{\mathbb{R}}$ and $N(r)>0$ such that, for all $z\in X$ , and $y\in Y$ with $\|y\|_{2}<r$ ,

[TABLE] 2. 2.

*For every $r>0$ there is a constant $M(r)>0$ such that, for all $z,\,v\in X$ with *

$\max\{\|z\|_{X},\|v\|_{X}\}<r$ ,

[TABLE] 3. 3.

There exists a constant $M>0$ such that for any $y,\,y^{\prime}\in Y$ , we have

[TABLE]

A detailed proof of the proposition is provided in Appendix A. Following Proposition 2.1, we can conclude that the hybrid prior (7), and the log-likelihood function Eq. (6) yield a well-behaved posterior measure given by Eq. (9) in the infinite-dimensional setting, as is summarized in the following theorem:

Theorem 2.2.

For $\Phi(z)$ given by Eq. (6) and prior measure $\mu_{\mathrm{pr}}$ given by Eq. (7), we have the following results:

$\mu^{y}$ * given by Eq. (9) is a well-defined probability measure on $X$ .* 2. 2.

$\mu^{y}$ * given by Eq. (9) is Lipschitz in the data $y$ , with respect to the Hellinger distance: if $\mu^{y}$ and $\mu^{y^{\prime}}$ are two measures corresponding to data $y$ and $y^{\prime}$ the there exists $C=C(r)$ such that, for all $y,y^{\prime}$ with $\max\{\|y\|_{2},\|y^{\prime}\|_{2}\}<r$ ,*

[TABLE] 3. 3.

Let

[TABLE]

where $\Phi_{N_{1}}(z)$ is a $N_{1}\in{\mathbb{N}}$ dimensional approximation of $\Phi(z)$ and $R_{N_{2}}(z)$ is a $N_{2}\in{\mathbb{N}}$ dimensional approximation of $R(z)$ . Assume that $\Phi_{N_{1}}$ satisfies the three properties of Proposition 2.1 with constants uniform in $N_{1}$ , and $R_{N_{2}}$ satisfy Assumptions A.2 (i) and (ii) in **[43]** with constants uniform in $N_{2}$ . Assume also that for any $\epsilon>0$ , there exist two positive sequences $\{a_{N_{1}}(\epsilon)\}$ and $\{b_{N_{2}}(\epsilon)\}$ converging to zero, such that $\mu_{0}(X_{\epsilon})\geq 1-\epsilon$ for any $N_{1},N_{2}\in{\mathbb{N}}$ , where

[TABLE]

Then we have

[TABLE]

Theorem 2.2 is a direct consequence of Proposition 2.1, and the proof of the theorem can be found in [43] and is omitted here.

Finally, we note that, in the numerical implementations, we use the truncated Karhunen-Lo $\grave{\textrm{e}}$ ve (KL) expansion [28] to represent the unknown $z$ . Namely, we write $z$ as

[TABLE]

where $\{\eta_{i},\,e_{i}(\mathbf{x})\}$ are the eigenvalue-eigenfunction pair of the covariance operator $C_{0}$ , and $(z_{1},\,...,\,z_{N})$ are independent with each following a standard normal distribution. In the KL representation, the number of KL modes (eigenfunctions) $N$ corresponds to the discretization dimensionality.

3 The primal-dual preconditioned Crank-Nicolson MCMC algorithm

In most practical image reconstruction problems, the posterior distribution can not be analytically calculated. Instead, one usually represent the posterior by samples drawn from it using MCMC algorithms. It is demonstrated in [14] that standard MCMC algorithms may become problematic in the infinite dimensional setting: its acceptance probability will generate to zero as the discretization dimensionality increases. Here we adopt the pCN MCMC algorithm particularly developed for the infinite dimensional problems [14]. An important feature of the pCN MCMC algorithm is that its sampling efficiency is independent of discretization dimensionality up to the numerical errors in the evaluation of the functionals $R(\cdot)$ and $\Phi(\cdot)$ , which makes it particular useful for sampling the posterior distribution defined in function spaces. We start with a brief introduction of the pCN algorithm following the presentation of [14]. We denote $\Phi(z)+R(z)$ of Eq. (9) as $\Psi(z)$ . Simply speaking the algorithms are derived by applying the Crank-Nicolson (CN) discretization to a stochastic partial differential equation whose invariant distribution is the posterior. We here omit the derivation details while referring interested readers to [14], and jump directly to the pCN proposal:

[TABLE]

where $z$ and $v$ are the present and the proposed positions respectively, $w\sim N(\xi,C_{0})$ and $\beta\in[0,1]$ is the parameter controlling the stepsize of the algorithm. The proposed sample $v$ is then accepted or rejected according to the acceptance probability:

[TABLE]

which is independent of discretization dimensionality up to numerical errors.

The pCN proposal in Eq. (12) can be improved by incorporating the data information in the proposal, and following the idea of Langevin MCMC for the finite dimensional problems, one can derive the preconditioned Crank-Nicolson Langevin (pCNL) proposal:

[TABLE]

where $\delta\in[0,2]$ , $w\sim N(\xi,{\bf C}_{0})$ and $\mathpzc{D}$ is the gradient operator with respect to $z$ . If we define $\rho(z,v)$ as following:

[TABLE]

then the acceptance probability is given by:

[TABLE]

The pCNL algorithm is usually more efficient than the standard pCN algorithm as it takes advantage of the gradient information of the $\Psi(z)$ . However, the pCNL algorithm can not be used directly in our problem as $\Psi(z)$ includes the TV term which is not differentiable. It is important to note that $-2\delta{\bf C}_{0}\mathpzc{D}\Psi(z)$ is the offset term only affecting the mean of the proposal, and we can replace $\mathpzc{D}\Psi(z)$ with an alternative direction $g(z)$ , yielding proposal:

[TABLE]

where $\delta\in[0,2]$ and $w\sim N(\xi,{\bf C}_{0})$ . Regarding the proposal given by Eq. (17), we have the following theorem:

Theorem 3.1.

Assume that $\Psi$ satisfies Assumptions 6.1 in [14], and $g(z)$ is in the Cameron-Martin space associate with the Gaussian measure $\mu_{0}$ . Let $q(z,dv)$ be the conditional distribution defined by Eq. (17), and define

[TABLE]

on on $X\times X$ . We have that $\eta^{\perp}$ is equivalent to $\eta$ and

[TABLE]

where

[TABLE]

The proof of the theorem is provided in Appendix. It should be clear that Theorem 3.1 implies that the MCMC algorithm with proposal (17) yields a well defined acceptance probability in the function space, and as a result the chain satisfies the detailed balance condition in the function space and thus is ergodic. Another very important theoretical issue here is to estimate the spectral gaps and prove the geometric ergodicity of the algorithms in the infinite dimensional setting. We note that there are some results on the spectral gaps of the standard pCN [22] and the generalized pCN [38]. It is an interesting problem to analyze if similar results can also be obtained for the present algorithm.

Now we need to find a good direction $g(z)$ . In [34] the authors use Moreau approximation to approximate the TV term in the Langevin MCMC algorithm. Here we shall provide an alternative approach, determining the offset direction in the MCMC iteration using the primal dual algorithm. The primal-dual algorithms are known to be very effective in solving optimization problems involving TV regularization [12, 13, 11], and we hereby give a brief description of the primal dual method applied to our problem. Suppose that we want to solve

[TABLE]

Introducing a new variable $\bm{\phi}(x)=[\phi_{1}(x),\,\phi_{2}(x)]$ with $\phi_{1}(\mathbf{x}),\phi_{2}(\mathbf{x})\in L_{2}(\Omega)$ (we denote this as $\bm{\phi}\in L_{2}^{2}(\Omega)$ ), and we then rewrite the optimization problem (20) as

[TABLE]

where $\|\phi\|_{2,1}=\left(\|\phi_{1}(\mathbf{x})\|^{2}_{L_{2}(\Omega)}+\|\phi_{2}(\mathbf{x})\|^{2}_{L_{2}(\Omega)}\right)^{1/2}$ . The augmented Lagrangian for Eq.(21) is

[TABLE]

where $\eta\in L_{2}^{q}(\Omega)$ is the dual variable or Lagrange multiplier, and $\rho>0$ is a constant called the penalty parameter. The resulting dual problem is then solved with the Alternating Direction Method of Multipliers (ADMM) [8]:

[TABLE]

The algorithm consists of a $z$ -minimization step (23a), a $\phi$ –minimization step (23b), and a dual ascent step (23c).

Our primal dual pCN(PD-pCN) algorithm is designed as follows. First we solve Eq.(22) with the ADMM algorithm 1 obtaining the solution $(z^{\star},\phi^{\star},\eta^{\star})$ and then we define

[TABLE]

where operator $\mathpzc{T}_{K}$ is the projection of its input function onto the space spanned by the KL models $\{e_{1},...c_{K}\}$ for a prescribed positive integer $K$ . $K$ should be no greater than the discretization dimensionality $N$ . It should be clear that the function $g(z)$ computed with Eq. (24) is in the the Cameron-Martin space of $\mu_{0}$ . The complete algorithm is given in Algorithm 1. We note here that the proposed MCMC algorithm involves an optimization problem at the beginning and the computational cost for solving this optimization is usually an order of magnitude lower that of the MCMC iterations. A main limitation of this algorithm is that, when some hyper-parameters change, the optimization problem needs to be solved again, which makes it incompatible with Metropolis within Gibbs [2] type of methods. It is also worth noting here that, the main purpose of the proposed algorithm is to improve the sampling efficiency of the standard pCN algorithm while maintaining its dimension independence property. To this end a very interesting problem here is to incorporate the pCN framework with the aforementioned proximity based algorithm, and compare the performance with the PD based one.

4 Determining the hyperparameters

Just like the deterministic inverse problems, it is an important issue to determine the TV regularization parameter $\lambda$ in the hybrid prior. In the Bayesian setting, the regularization parametter can be determined by the empirical Bayes (EB) approach [20]. Namely, the EB method seeks to maximize

[TABLE]

where $Z(\lambda)$ is the normalization constant. A difficulty here is that $Z(\lambda)$ is usually not known in advance and needs to be evaluated with another Monte Carlo integration. To address this issue, a stochastic proximal gradient method was proposed in [36, 16]. The method can efficiently estimate the regularization parameter $\lambda$ without the knowledge of $Z$ . On the other hand, the method does require a suitable admissible set for $\lambda$ . Here we provide a statistical approach to determine the admissible set of $\lambda$ , which is derived from the realized discrepancy method for model assessment proposed in [19].

The basic idea of the method is to choose a function $D(\mathbf{y},\bm{\theta})$ that measures the discrepancy between the measured data $\mathbf{y}$ and the projected observable $\bm{\theta}$ , and for the present problem we use the $\chi^{2}$ discrepancy,

[TABLE]

Now knowing that $\theta={\mbox{A}}u$ , we can use this discrepancy to assess how well a specific choice of $u$ fits the data. The classical $p$ -value based on the discrepancy $D(\mathbf{y},\bm{\theta})$ is

[TABLE]

where $\tilde{\mathbf{y}}$ is the simulated data from model (3). In particular, for the discrepancy function given in Eq. (25), the $p$ -value is simply,

[TABLE]

where $F_{\chi_{d}^{2}}(\cdot)$ is the cumulative distribution function of the $\chi^{2}$ distribution with the degree of freedom $d$ . The classic $p$ -value computed this way provides an assessment of how well a single estimate of $u$ fits the data $\mathbf{y}$ . The method can be extended to the Bayesian setting to assess the fitness of the posterior distribution to data. First recall that our prior distribution given by Eq. (7) is specified by the parameter $\lambda$ , and as a result the posterior also depends on $\lambda$ and here we write the posterior as $\mu_{\lambda}^{y}(du)$ to emphasize its dependence on $\lambda$ . In the Bayesian setting, one can compute the posterior predictive $p$ -value:

[TABLE]

which is essentially the classical $p$ -value averaged over the posterior distribution. The posterior predictive $p$ -value assesses the fitness of the posterior distribution to the data: intuitively speaking, larger value of $p_{b}$ indicates better fitness of the posterior to the data $\mathbf{y}$ . However, one can not simply choose the value of $\lambda$ that yields the largest value of $p_{b}$ , or, equivalently the best fitness to the data, as that may cause overfitting. In other words, if the posterior fits the data “too well”, it often implies that the effect of the prior distribution is so weak that the posterior is dominated by the data. In the image reconstruction problem, this situation is greatly undesirable, as the problem is highly ill-posed and we need significant contribution from the prior distribution to obtain good estimates of the unknown $u$ . In this respect, we should choose $\lambda$ in a way that the effects of the prior and data are well balanced, which should be indicated by an appropriate value of $p_{b}$ . The suitable values of $p_{b}$ are certainly problem dependent, and in the present problem we suggest to choose $\lambda$ so that the resulting value of $p_{b}$ is approximately in the range of $0.1\sim 0.7$ . Based on this, we choose the admissible set for $\lambda$ to be

[TABLE]

The optimal value of $\lambda$ is then determined by using the method in [16] within $\Lambda$ . It is worth mentioning that methods using the data discrepancy to determine regularization parameters for Poisson data model have also been developed in the deterministic setting, and we refer to [5, 7] for further details. It is important to note that the discrepancy principle may lead to over-smoothing in certain problems [23], which, however, may not cause issues in the proposed method as it just uses the discrepancy method to identify the admissible set of the regularization parameter while the actual value of it is determined with EB. Finally we also note that, in addition to $\lambda$ , the Gaussian distribution may also be subject to hyper-parameters, and in principle these hyper-parameters can be determined along with $\lambda$ using the proposed approach. However, here we choose not to do so for two reasons: first determining multiple parameters may significantly increase the computational cost; second, as the Gaussian distribution is merely used as a reference measure in our hybrid prior, the posterior distribution is not sensitive to it, and it usually suffices to choose these hyper-parameters based upon certain prior information (for example, historical data).

5 Artifact detection using the posterior distribution

In practical image reconstruction problems, due to the imperfection of methods or devices, a reconstructed image may contain what are not present in the original imaged object. In this section we describe an application of the posterior distribution to detect artifacts in a reconstructed image.

Specifically, we consider the posterior distribution of the image at any given point $\mathbf{x}$ , which is denoted as $u_{\mathbf{x}}$ . Consequently we can write the posterior distribution of $u_{\mathbf{x}}$ as $\pi_{\mathbf{x}}(u_{\mathbf{x}}|\mathbf{y})$ . Next we consider the highest posterior density interval (HPDI) which is essentially the narrowest interval corresponding to a given confidence level. More precisely, for an $\alpha\in[0,1]$ , the $100(1-\alpha)\%$ HPDI is defined as [35].

[TABLE]

where $\pi_{\alpha}$ is the largest constant satisfying ${\mathbb{P}}[u_{\mathbf{x}}|\pi_{\mathbf{x}}(u_{\mathbf{x}}|\mathbf{y})>\pi_{\alpha}]=1-\alpha$ . Now suppose that we have a reconstruct image $\hat{u}$ and we also write its value at $\mathbf{x}$ as $\hat{u}_{\mathbf{x}}$ . Next we shall estimate how large the credible level $(1-\alpha)$ must be so that the associated HDPI may contain $\hat{u}_{\mathbf{x}}$ . That is, we compute the smallest value of $(1-\alpha)$ such that,

[TABLE]

Intuitively speaking, the larger the computed credible level $(1-\alpha)$ is, the more likely the considered $\hat{u}_{\mathbf{x}}$ is an artifact. And we thus use the credible level $(1-\alpha)$ to measure how likely a point is an artifact, and we can do this test for any point $\mathbf{x}\in\Omega$ . Alternatively, the problem can also be formulated as a Bayesian hypothesis test with a fixed $\alpha$ (e.g. $\alpha=5\%$ ) [33]: that is, $\hat{u}(\mathbf{x})$ is regarded to be an artifact if it is not contained in the $(1-\alpha)$ HPDI for the prescribed value of $\alpha$ . However, it has been pointed out in [41] that performing hypothesis test with HPDI may cause certain theoretical issue and so here we choose not to use the hypothesis test formulation.

It should be noted here that, in [35, 15], the authors utilize the highest posterior density (HDP) region to test if a candidate image is likely to be a solution to the reconstruction problem. The purpose of the present work differs from the aforementioned ones in that we want to identify regions or pixels which are unlikely to be present in the original image, rather than to assess the entire image.

6 Numerical results

In this section we demonstrate the performance of the proposed Bayesian framework, by applying it to a PET image reconstruction problem with synthetic data. In particular the ground truth image (Fig. 1, left) is chosen from the Harvard whole brain atlas [1]. We let $\Omega=(0,1)^{2}$ and set the image size to be $128\times 128$ . In the Radon transform we use 60 projections equilaterally sampled from 0 to $\pi$ . In the numerical experiments, we consider two different noise levels: $K=0.5$ corresponding to a higher noise level and $K=1$ corresponding to a lower noise level. The test data, shown in Figs. 1, are randomly simulated by plugging the true image into the Radon transform and the Poisson distribution (3) where the two aforementioned noise level $K$ . In the Bayesian inference, we use the hybrid prior distribution where the Gaussian part is taken to be zero mean and covariance:

[TABLE]

where $d$ is taken to be $10^{-3}$ and $\gamma$ is $2$ . The regularization parameter $\lambda$ are determined by using the method presented in Section 4, and details will be discussed in next section.

6.1 Determining parameter $\lambda$

As is discussed at the beginning of the section, the prior parameter $\lambda$ is determined with the realized discrepancy method discussed in Section 4. We here provide some details on the issue. Specifically we test five different values of $\lambda$ for $K=0.5$ : $\lambda=0,1,2,3,4,5$ , and using the method discussed in Section 4 we compute the corresponding posterior predictive $p$ -value for each value of $\lambda$ , shown in Table 1. Similarly we also test 5 values of $\lambda$ for $K=1$ and the results are shown in Table 2. We can see from the table that, as $\lambda$ increases, the resulting $p$ -value decays. These results agree well with our expectation that as $\lambda$ becomes larger, the prior distribution becomes stronger, and as a result the $p$ -value which assesses the fitness of the posterior to the data becomes smaller. We also compute the PSNR of the posterior distribution computed with all these $\lambda$ values, and the results are also given in the table. We can see here that, for both very large and very small $p$ -values, the associated posterior means are of rather poor quality in terms of PSNR. That is, when the $p$ -value is too large, the posterior distribution overfits the data, and when it is too small, the posterior underfits the data; both cases lead to a poor performance of the inference, and so we must choose a proper $p$ -value that represents a good balance of the prior and the data. Bases on the test results, for $K=0.5$ we choose $\Lambda=[1,\,3]$ and for $K=1$ we choose $\Lambda=[0.5,\,2]$ . By optimizing $\lambda$ within the identified intervals we obtain $\lambda=2.4$ for $K=0.5$ and $\lambda=1.2$ for $K=1$ .

6.2 Convergence with respect to discretization dimensionality

In the numerical implementation we represent the unknown $z$ using the truncated KL expansion with $N$ KL-modes. First we shall demonstrate that the posterior distributions converges with respect to the discretization dimensionality $N$ . We here use the case $K=1$ as an example. We perform the proposed PD-pCN MCMC simulation and compute the posterior means with six different values of $N$ : $N_{i}=i\times 10^{3}$ for $i=1...6$ . We note here that, in all the MCMC simulations performed in this section, we fix the number of samples to be $5\times 10^{5}$ with additional $0.5\times 10^{5}$ samples used in the burn-in step, and also, in all the simulations the stepsize $\beta$ has been chosen in a way that the resulting acceptance probability is around $25\%$ . We then compute the $L_{2}$ norm of the difference between the posterior mean with $N=N_{i}$ and that with $N=N_{i+1}$ for each $i=1...6$ :

[TABLE]

where $\hat{u}_{N_{i}}$ is the posterior mean of $u$ computed with $N_{i}$ KL modes. We plot the $L_{2}$ difference against the discretization dimensionality $N$ in Fig. 2. One can see from the figure that the difference decrease as $N$ increase and the difference becomes approximately zero for $N=5000$ and $N=6000$ , indicating the convergence of the posterior mean with respect to $N$ . For each posterior mean $\hat{u}_{N_{i}}$ , we also compute its peak signal-to-noise ratio (PSNR) [26], a commonly used metric of the quality of a reconstructed image. We show the PSNR results in Fig. 2 (right), and the figure shows that the PSNR increases as $N$ increases from 1000 to 4000, and remains approximately constant from 4000 to 6000, suggesting that increasing the discretization dimensionality can improve the inference accuracy until the posterior converges, and so it is important to use sufficiently large discretization dimensionality in such problems. Next to further demonstrate that the proposed PD-pCN MCMC algorithm is independent of discretization dimensionality, we perform the MCMC simulation with different values of $\delta$ which is the parameter controlling the step size of the algorithm. In Fig 3 we plot the average acceptance probability as a function of $\delta$ for three different values of $N$ , and one can see that the acceptance probabilities under different discretization dimensionality agree well with either other, indicating that the acceptance probability of the algorithm is independent of the discretization dimensionality $N$ . In the rest of the work, we fix $N=6000$ .

6.3 Sampling efficiency of the PD-pCN algorithm

Next, we shall compare the performance of the proposed PD-pCN algorithm and the standard pCN. We draw $5\times 10^{5}$ samples from the posterior distribution using both the standard pCN and the proposed PD-pCN algorithms. We reinstate that in both algorithms we have chosen the step size so that the resulting acceptance probability is around $25\%$ . In particular, to achieve the sought acceptance probability, the values of the stepsize parameter $\beta$ in pCN are taken to be $0.04$ (for $K=1$ ) and $0.09$ (for $K=0.5$ ); the values of the stepsize parameter $\delta$ in PD-pCN are $0.18$ (for $K=1$ ) and $0.23$ (for $K=0.5$ ). The total computational time is around 12 hours in a workstation with a 6-core 2.50 GHZ processor. We compute the auto-correlation function (ACF) of the samples generated by the two methods at all the grid points, and we show the ACF at the points with the fastest and the slowest convergence rates, in Fig. 4 (K=0.5) and Fig. 5 (K=1). One can see from the figures that at both points the ACF of the propose PD-pCN method decays much faster than that of the standard pCN. To further compare the performance, we compute the effective sample size(ESS) which is defined as,

[TABLE]

where $\tau$ is the integrated auto-correlation time and $N$ is total sample size. In Fig. 6, we compare the ESS at three chosen rows from left to right in the image, namely row 1, 64 and 128, for $K=1$ . Just as the ACF, the results show that the PD-pCN algorithm achieves much higher ESS than the standard pCN. We have also examined the ESS for $K=0.5$ , where the results are qualitatively similar to those three shown in Fig. 6, and so we omit those results.

6.4 The inference results

To illustrate the inference results, we compute the posterior mean of the TG prior, which is regarded as a point estimate of the image. As is mentioned earlier, a main advantage of the Bayesian method is its ability to quantify the uncertainty in the reconstruction and to this end, we use the width of the (pointwise) 95% HPDI as a metric of the posterior uncertainty (intuitively speaking the wider the HPDI is, the more uncertainty there is). We plot these posterior results in Figs. 7: the posterior mean and the 95% HPDI for $K=0.5$ and $K=1$ are shown in Fig. 7(a) and Fig. 7(c) respectively. As a comparison, we also compute the posterior mean as well as the 95% HPDI width, for the Gaussian prior corresponding to setting $\lambda=0$ in the TG prior, and the results are also shown in Fig. 7(b) and Fig. 7(d) . The figures show that the posterior mean obtained with the TG prior is clearly of better quality than that of the Gaussian prior, suggesting that including the edge-preserving TV term significantly improves the performance of the prior. It is worth noting here that, the Gaussian prior used here is not optimized for the best performance, and the performance can be potentially improved by using some carefully designed Gaussian priors, for example, [10].

6.5 Identifying artifacts using HPDI

As is discussed in Section 5, an important application of the proposed Bayesian framework is that the resulting posterior distribution can be used to detect artifacts in a reconstructed image. We now demonstrate this application with three surrogate test images which are generated by making certain modification of the ground truth. Figs 8 summarize the results for $K=1$ . Specifically the first image shown in Fig. 8(a) is generated by adding some random noise to the ground truth without any structural changes, the second one shown in Fig. 8(c) is generated by adding some artificial components to the ground truth, and the third one shown in Fig. 8(e), is the result of removing some components from the ground truth. Thus both the last two test images have structural changes from the ground truth, and in both figures, the regions in which components are altered from the ground truth are highlighted with red boxes. We compute the credible level $(1-\alpha)$ for all three images, and show the results in Figs. 8(b) (for test image 1), 8(d) (for test image 2) and 8(f) (for test image 3). It can be seen here that, though the first test image is visibly perturbed by random noise, it does not have structural difference from the ground truth and so credible level result in Fig. 8(b) does not suggest any region has high likelihood to contain artifacts. On the other hand, in the other two test images, at the locations where the original image is altered (i.e., artifacts introduced), the resulting credible level $(1-\alpha)$ is significantly higher than other regions, suggesting that these locations may contain artifacts. Figs. 9 shows the same test results but for $K=0.5$ , and one can see that the figures exhibits qualitative the same behaviors as those of $K=1$ . The results demonstrate that the proposed method can rather effectively detect the artifacts in a test image.

7 Conclusions

In this work, we have presented a complete treatment for performing Bayesian inference and uncertainty quantification for medical image reconstruction problems with Poisson data. In particular, we formulate the problem in an infinite dimensional setting and we prove that the resulting posterior distribution is well-posed in this setting. Second, to sample the unknown function/image, we provide a modified pCNL MCMC algorithm, the efficiency of which is independent of discretization dimensionality. Specifically the modified algorithm calculates the offset direction in the original pCNL algorithm by using a primal-dual method, to avoid computing the gradient of the TV term in our formulation. Third, we also give a method to determine the TV regularization parameter $\lambda$ which is critical for the prior distribution. The method is based on the realized discrepancy method for assessing model fitness. Finally we provide an application of the uncertainty information obtained by the Bayesian framework, using the posterior distribution to identify possible artifacts in an image reconstructed. We believe the proposed Bayesian framework can be used to reconstruct images and evaluate the uncertainty associated to reconstruction the results in many practical medical imaging problems with Poisson data.

There are several problems related to this work that we plan to investigate in the future. In the future, we plan to apply the methods developed in this work to those real-world problems, especially the PET image reconstruction.

Appendix A Proof of Proposition 2.1

We provide a proof of Proposition 2.1 here.

Proof A.1.

(1) From Eq. (2), and Eq. (4), we obtain directly that

[TABLE]

for two constant vectors $\underline{\bm{\theta}}$ and $\bm{\overline{\theta}}$ . It follows directly that $\|\ln\bm{\theta}\|_{2}\leq l_{\max}$ for a positive constant $l_{\max}$ .

For every $r>0$ , we have $\|y\|_{2}<r$ . By the Cauchy-Schwarz inequality, we obtain the lower bound,

[TABLE]

For upper bound, once again we apply the Cauchy-Schwarz inequality to the functional $\Phi$ , obtaining

[TABLE]

(2) In this proof, we use $M$ for positive constants. Let $z$ and $v$ be any two elements in $X$ , and we have,

[TABLE]

Since the Radon transform $A$ is a bounded linear operator from the $L_{2}$ space to $R^{d}$ [30], we have,

[TABLE]

*which completes the proof. *

(3) For any $y,\,y^{\prime}\in Y$ , it is easy to show,

[TABLE]

Appendix B Proof of Theorem 3.1

We define $\eta_{0}(z,v)$ to be the measure $\eta(z,v)$ on $X\times X$ with $\Psi\equiv 0$ , and it is obvious that the measure $\eta_{0}(z,v)$ is Gaussian. Moreover we have,

[TABLE]

and that the measures $\mu$ and $\mu_{0}$ are equivalent. It follows that $\eta$ and $\eta_{0}$ are equivalent and

[TABLE]

Now we define

[TABLE]

and by some elementary calculations we can derive,

[TABLE]

As $z,v\in X$ and $g(z)$ is in the Cameron-Martin space of $\mu_{0}$ , $\langle z-v,g(z)\rangle$ , $\langle v+z,g(z)\rangle$ and $||\mathpzc{C}_{0}^{\frac{1}{2}}g(z)||^{2}$ are finite, and $\frac{d\eta_{0}^{\perp}}{d\eta_{0}}$ is well defined. Now recall that,

[TABLE]

Substituting Eqs. (33) and (34) into the Eq. (35) yields,

[TABLE]

where

[TABLE]

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Johnson K A and Becker J. www.med.harvard.edu/aanlib/ . Accessed April 4, 2010.
2[2] Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I Jordan. An introduction to mcmc for machine learning. Machine learning , 50(1-2):5–43, 2003.
3[3] Simon R Arridge, Kazufumi Ito, Bangti Jin, and Chen Zhang. Variational gaussian approximation for poisson data. Inverse Problems , 34(2):025005, 2018.
4[4] Dale L Bailey, Michael N Maisey, David W Townsend, and Peter E Valk. Positron emission tomography . Springer, 2005.
5[5] Johnathan M Bardsley and John Goldes. Regularization parameter selection methods for ill-posed poisson maximum likelihood estimation. Inverse Problems , 25(9):095005, 2009.
6[6] Johnathan M Bardsley and Aaron Luttman. A metropolis-hastings method for linear inverse problems with poisson likelihood and gaussian prior. International Journal for Uncertainty Quantification , 6(1), 2016.
7[7] Mario Bertero, Patrizia Boccacci, Giorgio Talenti, Riccardo Zanella, and Luca Zanni. A discrepancy principle for poisson data. Inverse problems , 26(10):105004, 2010.
8[8] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , 3(1):1–122, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Bayesian inference and uncertainty quantification for medical image reconstruction

Abstract

keywords:

1 Introduction

2 Infinite dimensional Bayesian image reconstruction with Poisson data

2.1 The Bayesian inference formulation for functions

2.2 Poisson data model and the positivity-preserving reparametrization

2.3 Bayesian framework with the hybrid prior for PET imaging

Proposition 2.1**.**

Theorem 2.2**.**

3 The primal-dual preconditioned Crank-Nicolson MCMC algorithm

Theorem 3.1**.**

4 Determining the hyperparameters

5 Artifact detection using the posterior distribution

6 Numerical results

6.1 Determining parameter λ\lambdaλ

6.2 Convergence with respect to discretization dimensionality

6.3 Sampling efficiency of the PD-pCN algorithm

6.4 The inference results

6.5 Identifying artifacts using HPDI

7 Conclusions

Appendix A Proof of Proposition 2.1

Proof A.1**.**

Appendix B Proof of Theorem 3.1

Proposition 2.1.

Theorem 2.2.

Theorem 3.1.

6.1 Determining parameter $\lambda$

Proof A.1.