Deconvolution and Restoration of Optical Endomicroscopy Images

Ahmed Karam Eldaly; Yoann Altmann; Antonios Perperidis; Nikola; Krstajic; Tushar Choudhary; Kevin Dhaliwal; and Stephen McLaughlin

arXiv:1701.08107·cs.CV·August 29, 2018

Deconvolution and Restoration of Optical Endomicroscopy Images

Ahmed Karam Eldaly, Yoann Altmann, Antonios Perperidis, Nikola, Krstajic, Tushar Choudhary, Kevin Dhaliwal, and Stephen McLaughlin

PDF

TL;DR

This paper introduces a hierarchical Bayesian framework for deconvolving and restoring optical endomicroscopy images, improving image quality by addressing fiber core cross coupling and sparse sampling issues.

Contribution

It proposes a novel Bayesian model and compares three estimation algorithms, including MCMC, VB, and ADMM, for OEM image restoration.

Findings

01

Bayesian methods effectively restore OEM images

02

VB and ADMM algorithms reduce computational time

03

Restored images show improved visualization and analysis

Abstract

Optical endomicroscopy (OEM) is an emerging technology platform with preclinical and clinical imaging applications. Pulmonary OEM via fibre bundles has the potential to provide in vivo, in situ molecular signatures of disease such as infection and inflammation. However, enhancing the quality of data acquired by this technique for better visualization and subsequent analysis remains a challenging problem. Cross coupling between fiber cores and sparse sampling by imaging fiber bundles are the main reasons for image degradation, and poor detection performance (i.e., inflammation, bacteria, etc.). In this work, we address the problem of deconvolution and restoration of OEM data. We propose a hierarchical Bayesian model to solve this problem and compare three estimation algorithms to exploit the resulting joint posterior distribution. The first method is based on Markov chain Monte Carlo…

Figures28

Click any figure to enlarge with its caption.

Tables2

Table 1. TABLE I: The average computation time (in seconds) of the three proposed methods. In order to maintain a fair comparison between the three algorithms, the computational time of the ADMM algorithm corresponds to the duration of five runs (used to select the best regularization parameter among the five values).

Method

MCMC

ADMM

VB

Computation time (sec.)

3100

35.51

5.12

Table 2. TABLE II: Computation time (in seconds) for the real data. In order to keep a fair comparison between the three algorithms, the computational times of the ADMM algorithm correspond to the duration of five runs (used to select the best regularization parameter among five values).

Dataset/Method	MCMC	ADMM	VB
USAF chart	$1.12 \times 10^{5}$	$250$	5.9
Lung tissue	$1.46 \times 10^{6}$	$870$	16.05

Equations93

y = A x + w,

y = A x + w,

x minimize

x minimize

g = C H x + w,

g = C H x + w,

_{i, j} = exp (- (\frac{d _{i, j}}{α _{H}})^{β_{H}}),

_{i, j} = exp (- (\frac{d _{i, j}}{α _{H}})^{β_{H}}),

y = H x + w .

y = H x + w .

f (y ∣ x, σ^{2}) = (\frac{1}{2 π σ ^{2}})^{N_{1} /2} exp (- \frac{∥ y - H x ∥ _{2}^{2}}{2 σ ^{2}}) .

f (y ∣ x, σ^{2}) = (\frac{1}{2 π σ ^{2}})^{N_{1} /2} exp (- \frac{∥ y - H x ∥ _{2}^{2}}{2 σ ^{2}}) .

f (x ∣ γ^{2}) \propto (γ^{2})^{- d /2} exp (- \frac{x ^{T} Δ ^{- 1} x}{2 γ ^{2}}) 1_{R^{+}} (x),

f (x ∣ γ^{2}) \propto (γ^{2})^{- d /2} exp (- \frac{x ^{T} Δ ^{- 1} x}{2 γ ^{2}}) 1_{R^{+}} (x),

_{n, n^{'}} = exp (- (\frac{d _{n, n^{'}}}{ℓ})^{κ}),

_{n, n^{'}} = exp (- (\frac{d _{n, n^{'}}}{ℓ})^{κ}),

f (σ^{2} ∣ α, β) \sim I G (α, β),

f (σ^{2} ∣ α, β) \sim I G (α, β),

β \sim G (α_{o}, β_{o}),

β \sim G (α_{o}, β_{o}),

γ^{2} \sim I G (η, ν),

γ^{2} \sim I G (η, ν),

f (Ω, ϕ ∣ y) \propto f (y ∣ Ω) f (Ω ∣ ϕ) f (ϕ),

f (Ω, ϕ ∣ y) \propto f (y ∣ Ω) f (Ω ∣ ϕ) f (ϕ),

f (Ω ∣ ϕ) = f (x ∣ γ^{2}) f (σ^{2} ∣ β), and f (ϕ) = f (γ^{2}) f (β) .

f (Ω ∣ ϕ) = f (x ∣ γ^{2}) f (σ^{2} ∣ β), and f (ϕ) = f (γ^{2}) f (β) .

f (x ∣ y, σ^{2}) \sim N_{R^{+}} (x; μ, Σ),

f (x ∣ y, σ^{2}) \sim N_{R^{+}} (x; μ, Σ),

μ = σ^{- 2} Σ^{T} H^{T} y,

μ = σ^{- 2} Σ^{T} H^{T} y,

Σ = (σ^{- 2} H^{T} H + γ^{- 2} Δ^{- 1})^{- 1} .

f (σ^{2} ∣ y, x) \sim I G (α + \frac{N _{1}}{2}, β + \frac{∥ y - H x ∥ _{2}^{2}}{2}),

f (σ^{2} ∣ y, x) \sim I G (α + \frac{N _{1}}{2}, β + \frac{∥ y - H x ∥ _{2}^{2}}{2}),

f (β ∣ σ^{2}) \sim G (α + α_{o}, \frac{σ ^{2} β _{o}}{σ ^{2} + β _{o}}) .

f (β ∣ σ^{2}) \sim G (α + α_{o}, \frac{σ ^{2} β _{o}}{σ ^{2} + β _{o}}) .

f (γ^{2} ∣ x) \sim I G (η + \frac{N _{1}}{2}, ν + \frac{x ^{T} Δ ^{- 1} x}{2}) .

f (γ^{2} ∣ x) \sim I G (η + \frac{N _{1}}{2}, ν + \frac{x ^{T} Δ ^{- 1} x}{2}) .

\hat{x} = \frac{1}{N _{MC} - N _{bi}} t = N_{bi + 1} \sum N_{MC} x^{(t)},

\hat{x} = \frac{1}{N _{MC} - N _{bi}} t = N_{bi + 1} \sum N_{MC} x^{(t)},

D_{KL} (q (Θ) ∣∣ p (Θ ∣ y)) = \int q (Θ) lo g (\frac{q ( Θ )}{p ( Θ ∣ y )}) d Θ,

D_{KL} (q (Θ) ∣∣ p (Θ ∣ y)) = \int q (Θ) lo g (\frac{q ( Θ )}{p ( Θ ∣ y )}) d Θ,

p (Θ, y) \geq p (y ∣ Θ) p (Θ ∣ ϕ) p (ϕ) = F (Θ, y) .

p (Θ, y) \geq p (y ∣ Θ) p (Θ ∣ ϕ) p (ϕ) = F (Θ, y) .

M (q (Θ)) = \int q (Θ) lo g (\frac{q ( Θ )}{p ( Θ ∣ y )}) d Θ

M (q (Θ)) = \int q (Θ) lo g (\frac{q ( Θ )}{p ( Θ ∣ y )}) d Θ

\leq \int q (H) (\int q (Θ_{\ H}) lo g (\frac{q ( H ) q ( Θ _{\ H} )}{F ( Θ , y )}) d Θ_{\ H}) d H

= M (q (H)) .

\overset{q}{^} (H) = q (H) minimize

\overset{q}{^} (H) = q (H) minimize

\overset{q}{^} (H) = co n s t \times exp (E_{q (Θ_{\ H})} [lo g F (Θ, y)]),

\overset{q}{^} (H) = co n s t \times exp (E_{q (Θ_{\ H})} [lo g F (Θ, y)]),

E q (Θ_{\ H}) [lo g F (Θ, y)] = \int lo g F (Θ, y) q (Θ_{\ H}) d Θ_{\ H} .

E q (Θ_{\ H}) [lo g F (Θ, y)] = \int lo g F (Θ, y) q (Θ_{\ H}) d Θ_{\ H} .

q^{k} (x) = N (x; E_{q^{k} (x)} (x), Σ_{q^{k} (x)} (x)),

q^{k} (x) = N (x; E_{q^{k} (x)} (x), Σ_{q^{k} (x)} (x)),

E_{q^{k} (x)} (x) = \frac{( Σ _{q^{k} (x)} ( x ) ) ^{T} H ^{T} y}{E _{q^{k} (σ^{2})} ( σ ^{2} )},

E_{q^{k} (x)} (x) = \frac{( Σ _{q^{k} (x)} ( x ) ) ^{T} H ^{T} y}{E _{q^{k} (σ^{2})} ( σ ^{2} )},

Σ_{q^{k} (x)} (x) = (\frac{H ^{T} H}{E _{q^{k} (σ^{2})} ( σ ^{2} )} + \frac{Δ ^{- 1}}{E _{q^{k} (γ^{2})} ( γ ^{2} )})^{- 1} .

q^{k} (σ^{2}) = I G (σ^{2}; \frac{N _{1}}{2} + α, E_{q^{k} (β)} (β) + E_{q^{k} (x)} [∥ y - H x ∥_{2}^{2}]),

q^{k} (σ^{2}) = I G (σ^{2}; \frac{N _{1}}{2} + α, E_{q^{k} (β)} (β) + E_{q^{k} (x)} [∥ y - H x ∥_{2}^{2}]),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Deconvolution and Restoration of Optical Endomicroscopy Images

Ahmed Karam Eldaly, Yoann Altmann, Antonios Perperidis, Nikola Krstajić, Tushar R. Choudhary, Kevin Dhaliwal, and Stephen McLaughlin A. K. Eldaly, Y. Altmann, A. Perperidis and S. McLaughlin are with the Institute of Sensors, Signals and Systems, School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, UK. (Emails: {AK577; Y.Altmann; A.Perperidis; S.Mclaughlin}@hw.ac.uk)T. R. Choudhary is with the Institute of Biological Chemistry, Biophysics and Bioengineering, Heriot-Watt University, Edinburgh, United Kingdom (Email: [email protected])N. Krstajić and K. Dhaliwal are with the EPSRC IRC Hub in Optical Molecular Sensing & Imaging, MRC Centre for Inflammation Research, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, UK (Emails: {N.Krstajic; Kev.Dhaliwal}@ed.ac.uk)This work was supported by the EPSRC via grant EP/K03197X/1 and the Royal Academy of Engineering through the research fellowship scheme.

Abstract

Optical endomicroscopy (OEM) is an emerging technology platform with preclinical and clinical imaging applications. Pulmonary OEM via fibre bundles has the potential to provide in vivo, in situ molecular signatures of disease such as infection and inflammation. However, enhancing the quality of data acquired by this technique for better visualization and subsequent analysis remains a challenging problem. Cross coupling between fiber cores and sparse sampling by imaging fiber bundles are the main reasons for image degradation, and poor detection performance (i.e., inflammation, bacteria, etc.). In this work, we address the problem of deconvolution and restoration of OEM data. We propose a hierarchical Bayesian model to solve this problem and compare three estimation algorithms to exploit the resulting joint posterior distribution. The first method is based on Markov chain Monte Carlo (MCMC) methods, however, it exhibits a relatively long computational time. The second and third algorithms deal with this issue and are based on a variational Bayes (VB) approach and an alternating direction method of multipliers (ADMM) algorithm respectively. Results on both synthetic and real datasets illustrate the effectiveness of the proposed methods for restoration of OEM images.

Index Terms:

Optical endomicroscopy, Deconvolution, Image restoration, Irregular sampling, Bayesian models.

I Introduction

Pneumonia is a major cause of morbidity and mortality in mechanically ventilated patients in intensive care [1]. However, the accurate diagnosis and monitoring of suspected pneumonia remain challenging [2]. Current methodologies consist of culturing bronchoalveolar lavage fluid (BALF) retrieved from bronchoscopy, but this often takes 48 hours to yield a result which still has low specificity and sensitivity [3]. Structural imaging with X-ray or computed tomography (CT) scans are also often non-diagnostic.

Optical endomicroscopy (OEM) is an emerging, optical fibre-based medical imaging modality with utility in a range of clinical indications and organ systems, including gastro-intestinal, urological and respiratory tracts. The technology employs a proximal light source, laser scanning or Light Emitting Diode (LED) illumination, linked to a flexible fibre bundle, performing microscopic fluorescent imaging at its distal end. The diameter of the packaged fibre can be $<$ 500\text{,}\mathrm{\SIUnitSymbolMicro m}$$ , enabling the real-time imaging of tissues that were previously inaccessible through conventional endoscopy. Probe-based confocal laser endomicroscopy, is currently the most widely used clinical OEM platform approved for clinical use. However, there have recently been a number of studies describing novel, flexible, versatile and low-cost OEM architectures [4, 5, 6], employing wide-field LED illumination sources, capable of imaging at multiple acquisition wavelengths [7]. Wide-field fiber optic imaging devices, such as the one being developed by our group provide sparse and usually irregularly-spaced intensity readings of the scene, due to the irregular packing of the fibre cores within the fibre bundle. Fibre bundles usually contain approximately 25,000 fibre cores that are transmitting and collecting the light simultaneously. Note that it is only the fibre cores which contain information while the cladding, (the space between the fibre cores), does not.

One of the main challenges of OEM images is enhancing the restoration of the signals at the receiver for better image visualization and/or subsequent analysis. Fiber core cross coupling is one of the main reasons for image degradation in this type of imaging [8, 9]. In confocal endomicroscopy, the detector pinhole can mask out light coupled to neighbouring cores before reaching the detector. Consequently, the effect of inter-core coupling in imaging capabilities is inherently of greater importance in wide-field endomicroscopy. Perperidis et al. [10] have quantified the average spread of inter-core coupled light, with approximately a third of the overall light coupling to neighbouring cores. Consequently, cross coupling causes severe blurring in the resulting images, whose restoration is formulated as an inverse problem. We will discuss in detail cross coupling effects in Section II. In this work, we consider a noisy observation vector ${\mathbf{y}}$ , of an original intensity vector ${\mathbf{x}}$ , that is modelled by the following linear forward model

[TABLE]

where ${\mathbf{A}}$ is the matrix representing a linear operator which can model different degradation. Here, ${\mathbf{A}}$ models fiber core cross coupling and/or spatial blur. We specify the dimensions of the variables later in the text. In (1), the vector ${\mathbf{w}}$ stands for additive noise, modelling observation noise and model mismatch and is assumed to be a white Gaussian noise sequence. In wide-field OEM, the constant background fluorescence of the fiber bundle [11, 7], is significant (between 90% and 60% of the total signal) providing a significant offset to all fluorescence measurements from tissue. Hence, the total noise level does not depend on the tissue signal level. Also, we consider applications where the photon flux is high ( $>500$ photoelectrons generated per pixel per typical exposure time 50 ms). Therefore, the Gaussian noise assumption holds [12, 13, 14].

The problem of estimating ${\mathbf{x}}$ from ${\mathbf{y}}$ is an ill-posed linear inverse problem (LIP); i.e., the matrix ${\mathbf{A}}$ is singular or very ill-conditioned. Consequently, this problem requires additional regularization (or prior information, in Bayesian inference terms) in order to reduce uncertainties and improve estimation performance. State-of-the-art algorithms for solving such problems can be split into either convex optimization or Bayesian methods.

In [15, 16, 17, 18], the problem of estimating ${\mathbf{x}}$ given ${\mathbf{y}}$ is formulated as an unconstrained optimization problem as follows

[TABLE]

where $\phi(\cdot)$ is a regularization function, $\lVert.\rVert_{2}$ is the standard $\ell_{2}$ -norm, $\lambda\in\mathbb{R}_{+}$ is a regularization parameter, and $i_{\mathbb{R}^{+}}({\mathbf{x}})$ is the indicator function defined on the positive set of ${\mathbf{x}}$ . For solving problems of the form (2), state-of-the-art algorithms potentially belonging to the iterative shrinkage/thresholding family [15, 16, 17, 18] can be used. In [19, 16], the unconstrained problem in Eq.(2) is solved by an algorithm called split augmented Lagrangian shrinkage algorithm (SALSA) which is based on variable splitting [20, 21].

Alternatively, many studies have considered hierarchical Bayesian models to solve the deconvolution and restoration problem [22, 23, 24, 25, 26, 27, 28, 29, 30, 31]. These models offer a flexible and consistent methodology to deal with uncertainty in inference when limited amount of data or information is available. Moreover, other unknown parameters can be jointly estimated within the algorithm such as noise variance(s) and regularization parameters. As such, they represent an attractive way to tackle ill-posed problems such as the one considered in this work. These methods rely on selecting an appropriate prior distribution for the unknown image and other unknown parameters. The full posterior distribution can then be derived from the Bayes’ rule, and then exploited by optimization or simulation-based (Markov chain Monte Carlo) methods.

The main contributions of this work are fourfold:

We address the problem of deconvolution and restoration in OEM. To the best of our knowledge, it is the first time this problem is addressed in a statistical framework by using a hierarchical Bayesian model. 2. 2.

We develop algorithms dedicated to irregularly sampled images which do not rely on strong assumptions about the spatial structure of the sampling patterns. The developed methods can thus be applied to a wide range of imaging systems, and fiber bundle designs. 3. 3.

We derive three estimation algorithms associated with the proposed hierarchical Bayesian model and compare them using extensive simulations conducted using controlled and real data. The first algorithm generates samples distributed according to the posterior distribution using Markov chain Monte Carlo (MCMC) methods [32]. This approach also allows the estimation of the hyperparameters associated with the priors. However, as mentioned previously, the resulting MCMC-based algorithm presents a high computational complexity. The second and third algorithms deal with this limitation and approximate the joint posterior distribution. The second algorithm uses the variational Bayes (VB) methodology [33, 34] to approximate the joint posterior distribution by minimizing the Kullback–Leibler (KL) divergence between the true posterior distribution and its approximation [35]. It can also estimate the hyperparameters associated with the prior distributions, and hence it is totally unsupervised, as is the MCMC-based method. The third algorithm is based on the alternating direction method of multipliers (ADMM). Although the low computation complexity of this algorithm, the hyperparameters associated with the priors need to be chosen carefully by the user, and hence it is considered as a semi-supervised method. 4. 4.

We use Gaussian Processes (GP) to interpolate the resulting samples to provide a meaningful image and quantify uncertainties at each interpolated sample.

The remaining sections of the paper are organized as follows. Section II discusses the cross coupling problem and formulates the problem of deconvolution and restoration of OEM data. The proposed hierarchical Bayesian model is then presented in Section III. Section IV introduces the three proposed estimation algorithms based on MCMC and optimization. Results of simulations conducted using synthetic and real datasets are discussed in Section VI and Section VII, respectively. Conclusions and future work are finally reported in Section VIII.

II Problem Formulation

Fig. 1 illustrates what happens in the fibre bundle when receiving fluorescent light from an object being imaged. The vectors ${\mathbf{x}}_{o}$ , ${\mathbf{x}}$ , and ${\mathbf{g}}$ represent light intensities at the object being imaged (tissue in this case), at the distal end of the fibre bundle, and at the image plane respectively. The transform ${\mathbf{H}}$ represents the cross coupling effect defined later in the text, ${\mathbf{C}}$ represents the spatial blur acting between the proximal end of the fibre bundle and the image plane, whereas ${\mathbf{C}}^{\prime}$ is that between the distal end of the fibre bundle and the tissue being imaged. The two spatial blurs ${\mathbf{C}}$ and ${\mathbf{C}}^{\prime}$ are spatially variant, ${\mathbf{C}}$ can be characterized as the distance $d$ between the image plane and the proximal end of the fibre is known, whereas ${\mathbf{C}}^{\prime}$ cannot be fully characterized as $d^{\prime}$ is unknown and the frames here are analyzed independently. Hence, to overcome this problem, we aim to recover the intensity vector ${\mathbf{x}}$ rather than ${\mathbf{x}}_{o}$ .

Fig. 2 provides and illustrative example of cross coupling between fiber cores. If an individual fiber core is illuminated in ${\mathbf{x}}$ , the neighbouring cores in ${\mathbf{g}}$ will be affected by a specific percentage of the incident light on the illuminated core. Experimental results in current fiber bundle (which might be different for other bundles) showed that around 61% of the light transmitted through a single core remains in that core, around 34% migrates to the immediate neighbouring cores, around 4% to the second order neighbours and less than 1% to the third, fourth, and fifth order neighbours [10].

Fig. 3 illustrates how we construct the forward observation model to mimic the same output as the endomicroscopy imaging system. The first image on the left-hand side of the figure represents the illumination of one fiber core. This results in cross coupling to the neighbouring cores (convolution with a first linear operator ${\mathbf{H}}$ ), then the spatial blurring effect around each fiber core (convolution with a second linear operator ${\mathbf{C}}$ ) and finally the fourth image of the figure shows the final system output after adding white Gaussian noise.

The linear model in (1) can now be written as

[TABLE]

where ${\mathbf{A}}$ in (1) is replaced by ${\mathbf{C}}{\mathbf{H}}$ in (3), the vector ${\mathbf{g}}$ is the observed data matrix, and ${\mathbf{x}}$ is the image to be restored.

From preliminary results, we propose to model cross-coupling by an isotropic zero mean 2D generalized Gaussian kernel applied to the fiber intensities [10] as follows

[TABLE]

where $d_{i,j}$ denotes the euclidean distance between the cores (or spatial locations) $i$ and $j$ , which corresponds to approximately 3.3 pixels between neighbouring cores. From (4), it can be seen that neighbouring fiber cores will be more closely coupled than distant ones. The values of $\alpha_{\boldsymbol{H}}$ and $\beta_{\boldsymbol{H}}$ , which control the amount of cross-coupling (the higher, the more coupling) and which are system dependent, are adjusted from preliminary measurements (calibration). Note that other cross-coupling models could also be considered instead of (4) depending on the imaging system used.

The spatial blur affecting each fiber core can be modelled by a Gaussian spatial filter, as illustrated in Fig. 4, which shows a background image i.e., an image from a sample presenting constant intensity, using an endomicroscopy imaging system, and a zoomed-in region of this image, bright and dark areas represent fiber cores and their cladding, respectively. The intensity profile across one line in this image is a series of Gaussian kernels. However, the variation of the shape and width of the kernels is due to the variation in core sizes.

Due to the variation in core sizes, the blurring kernel ${\mathbf{C}}$ varies accordingly, and hence the cores tend to overlap. So the complete model in (3) becomes more complex, and potentially computationally expensive for long image sequences (videos). Indeed, there is no structure in ${\mathbf{C}}$ which allows us to compute ${\mathbf{C}}{\mathbf{H}}{\mathbf{x}}$ rapidly. Hence we propose a simplification of this model and represent each core by a single intensity value. The mean intensities of fibre core pixels could be used, but the overlap between the cores makes its computation difficult. Since the variation of the width of this blur is not too significant, the maximum intensity of each core is considered instead ( ${\mathbf{y}}_{n}$ in Fig. 1).

Following the above mentioned points, the model in (3) can be simplified to

[TABLE]

Assume that $N$ is the total number of pixels in the image, and $N_{1}$ representing number of fibre cores in the image, the input ${\mathbf{y}}\approx{\mathbf{C}}^{+}{\mathbf{g}}\in\mathbb{R}^{N_{1}}$ , where ${\mathbf{C}}^{+}$ is the pseudo-inverse of ${\mathbf{C}}$ , and the output ${\mathbf{x}}\in\mathbb{R}^{N_{1}}$ are two vectors representing central core intensities, where, $N_{1}<<N$ , and ${\mathbf{H}}\in\mathbb{R}^{N_{1}\times N_{1}}$ . The noise ${\mathbf{w}}\in\mathbb{R}^{N_{1}}$ is assumed to be additive white noise which is independent and identically distributed (i.i.d) zero mean Gaussian noise with variance $\sigma^{2}$ , denoted as ${\mathbf{w}}\sim\mathcal{N}(\bf 0,\sigma^{2}{\mathbf{I}})$ , where $\sim$ means “is distributed according to” and ${\mathbf{I}}$ is the identity matrix.

The problem investigated in this paper is to estimate the actual intensity values ${\mathbf{x}}$ , and the noise variance $\sigma^{2}$ from the observation vector ${\mathbf{y}}$ . As mentioned previously, to solve this problem, we propose a hierarchical Bayesian model and a set of different estimation methods to estimate the unknown parameters.

III Hierarchical Bayesian Model

This section introduces a hierarchical Bayesian model proposed to estimate the unknown parameter vector ${\mathbf{x}}$ and $\sigma^{2}$ . This model is based on the likelihood function of the observations and on prior distributions assigned to the unknown parameters.

III-A Likelihood

Eq. (5) yields that ${\mathbf{y}}|({\mathbf{x}},\sigma^{2})\sim\mathcal{N}({\mathbf{H}}{\mathbf{x}},\sigma^{2}{\mathbf{I}})$ . Consequently, the likelihood can be expressed as

[TABLE]

III-B Parameter Priors

III-B1 Prior for the underlying intensity field ${\mathbf{x}}$

A truncated multivariate Gaussian distribution (MVG) is assigned to the intensity field ${\mathbf{x}}$ .

[TABLE]

where $1_{\mathbb{R}^{+}}({\mathbf{x}})$ is the indicator function defined on the positive set of ${\mathbf{x}}$ , $\gamma^{2}$ controls the global correlation between intensities, and the covariance matrix ${\boldsymbol{\Delta}}$ which defines the spatial correlation between the cores is defined by

[TABLE]

where $d_{n,n^{\prime}}$ denotes the distance between the spatial locations $n\text{ and }n^{\prime}$ , and $d=N_{1}$ . Equations (7) and (8) promote smooth intensity variations between neighbours while ensuring that the prior dependence between neighbouring cores decrease as $d_{n,n^{\prime}}$ increases. In this work $d_{n,n^{\prime}}$ is the standard euclidean distance. The parameters $\ell,\kappa$ were learned from the irregular sampling pattern of the OEM system. Precisely, we used known images and selected $(\ell,\kappa)$ by maximum likelihood estimation, which occurs when $p(\ell,\kappa|{\mathbf{x}})$ is at its greatest, which corresponds to maximizing $\log p(\ell,\kappa|{\mathbf{x}})$ . While $\gamma^{2}$ is left unknown for each image, $(\ell,\kappa)$ are fixed in the rest of the simulations as the average values obtained with the training images.

Considering such a prior is equivalent to assuming a Gaussian process on ${\mathbf{x}}$ , this allows us to interpolate the resulting deconvolved intensities using Gaussian processes [36] as we will see in section V.

III-B2 Prior for the noise variance $\sigma^{2}$

A conjugate inverse-Gamma $\mathcal{IG}$ prior is assigned to the noise variance $\sigma^{2}$

[TABLE]

where $\alpha=10$ is fixed arbitrarily, while the hyperparameter $\beta$ is estimated within the algorithm.

III-B3 Prior for the hyperparameter $\beta$

The hyperparameter associated with the parameter prior defined above is assigned to a conjugate Gamma distribution:

[TABLE]

where $\alpha_{o}$ and $\beta_{o}$ are fixed and user-defined parameters which might depend on the quality of the data to be recovered. In this work, we fixed $(\alpha_{o},\beta_{o})=(10,0.1)$ arbitrarily.

III-B4 Prior for the hyperparameter $\gamma^{2}$

To reflect the lack of prior knowledge about the regularization parameter $\gamma^{2}$ in (7), the following weakly informative conjugate inverse-Gamma prior is assigned to it.

[TABLE]

where $(\eta,\nu)$ are fixed to $(\eta,\nu)=(10^{-3},10^{-3})$ . Note that we did not observe significance change in the results when changing these hyperparameters.

The next section derives the joint posterior distribution of the unknown parameters associated with the proposed Bayesian model.

III-C Joint posterior distribution

Assuming the parameters ${\mathbf{x}}$ and $\sigma^{2}$ are a priori independent, the joint posterior distribution of the parameter vector ${\boldsymbol{\Omega}}=\{{\mathbf{x}},\sigma^{2}\}$ and hyperparameters ${\boldsymbol{\phi}}=\{\beta,\gamma^{2}\}$ can be expressed as

[TABLE]

where

[TABLE]

The directed acyclic graph (DAG) summarizing the structure of proposed Bayesian model is depicted in Fig. 5. This posterior distribution will be used to evaluate Bayesian estimators of ${\boldsymbol{\Theta}}=\{{\boldsymbol{\Omega}},{\boldsymbol{\phi}}\}$ . For this purpose, we propose three algorithms: an MCMC-based approach and two optimization-based approaches, in which VB and ADMM are considered. The first approach uses an MCMC method to evaluate the minimum-mean-square-error (MMSE) estimator of ${\boldsymbol{\Theta}}$ by generating samples according to the joint posterior distribution. Moreover, it allows the estimation of the hyperparameter vector ${\boldsymbol{\phi}}$ along with the noise variance $\sigma^{2}$ . However, it exhibits a relatively long computational time. The second and third algorithms which deal with this issue and provide fast MMSE estimate for the VB approach and MAP estimate for the ADMM approach. The VB approach approximates the joint posterior distribution in (12) by minimizing the Kullback-Leibler (KL) divergence between the true posterior distribution and its approximation [35]. The ADMM approach is achieved by maximizing the posterior distribution (12) with respect to (w.r.t.) ${\boldsymbol{\Theta}}$ . Note however, that the hyperparameters ${\boldsymbol{\phi}}$ as well as $\sigma^{2}$ are fixed for this approach. The three estimation algorithms are described in the next section.

IV Bayesian Inference

IV-A MCMC algorithm

To overcome the challenging derivation of Bayesian estimators associated with $f({\boldsymbol{\Theta}}|{\mathbf{y}})$ , we propose to use an efficient MCMC method to generate samples asymptotically distributed according to the posterior presented in (12). More precisely, we consider a Gibbs sampler described next. The principle of the Gibbs sampler is to sample according to the conditional distributions of the posterior of interest [[32], Chap. 10]. In this work, we propose to sample sequentially the elements of ${\boldsymbol{\Theta}}$ using updates that are detailed below.

IV-A1 Sampling the intensity field ${\mathbf{x}}$

From (12), since the prior (7) is conjugate to the Gaussian distribution, the full conditional distribution of ${\mathbf{x}}$ is given by

[TABLE]

where

[TABLE]

Sampling from (14) can be achieved efficiently by using the Hamiltonian method proposed in [37].

IV-A2 Sampling the noise variance $\sigma^{2}$

By cancelling out the terms that don’t depend on $\sigma^{2}$ from the posterior distribution in (12), its conditional distribution can be written as

[TABLE]

which is easy to sample from.

IV-A3 Sampling the hyperparameters $\beta$ and $\gamma^{2}$

It can be easily shown that $\beta$ can be sampled from the following Gamma distribution

[TABLE]

In a similar fashion to the noise variance, $\gamma^{2}$ can be sampled from the following inverse-Gamma distribution

[TABLE]

The algorithm for generating samples asymptotically distributed according to the posterior distribution using Gibbs sampler is shown in Algorithm 1.

The posterior distribution mean or minimum mean square error (MMSE) estimator of ${\mathbf{x}}$ can be approximated by

[TABLE]

where the samples from the first $N_{\text{bi}}$ iterations (corresponding to the transient regime or burn-in period, which is determined visually from preliminary runs) of the sampler are discarded.

IV-B Variational Bayes algorithm

For this approach, we consider an approximation of $p({\boldsymbol{\Theta}}|{\mathbf{y}})$ by a simpler tractable distribution $q({\boldsymbol{\Theta}})$ following the variational methodology [34], moreover, here, we relax the positivity constraints about the intensity field vector ${\mathbf{x}}$ . Note, however that the positivity constraints can be incorporated but the covariance matrix of the intensity field ${\mathbf{x}}$ would become more complicated [38], chap. 5. As will be shown in Sections VI and VII, this constraint relaxation yields a fast estimation procedure providing estimation results which compete with the methods incorporating this constraint. The distribution $q({\boldsymbol{\Theta}})$ will be found by minimizing the Kullback-Leibler (KL) divergence, between the actual posterior distribution and its approximation, given by [35] [39]

[TABLE]

which is always non-negative and equal to zero only when $q({\boldsymbol{\Theta}})=p({\boldsymbol{\Theta}}|{\mathbf{y}})$ . In order to obtain a tractable approximation, the family of distributions $q({\boldsymbol{\Theta}})$ are restricted utilizing the mean field approximation [40] so that $q({\boldsymbol{\Theta}})=q({\boldsymbol{\phi}})q({\mathbf{x}})q(\sigma^{2})$ , where $q({\boldsymbol{\phi}})=q(\gamma^{2})q(\beta)$ .

The lower bound of the KL divergence is given by

[TABLE]

For $\mathcal{H}\in\{x,\sigma^{2},\gamma^{2},\beta\}$ , let us denote by ${\boldsymbol{\Theta}}_{\backslash\mathcal{H}}$ , the subset of ${\boldsymbol{\Theta}}$ with $\mathcal{H}$ removed; for instance, if $\mathcal{H}={\mathbf{x}}$ , ${\boldsymbol{\Theta}}_{\backslash{\mathbf{x}}}=\{\sigma^{2},\gamma^{2},\beta\}$ . Then utilizing the lower bound ${\mathbf{F}}({\boldsymbol{\Theta}},{\mathbf{y}})$ for the joint probability distribution in (20) we obtain an upper bound for the KL divergence as follows

[TABLE]

Therefore, we minimize this upper bound instead of minimizing the KL divergence in (20). Note that the form of the inequality in (22) suggests an alternating (cyclic) optimization strategy where the algorithm cycles through the unknown distributions and replaces each variable with a revised estimate given by the minimum of (22) with the other distributions held constant. Thus, given $q({\boldsymbol{\Theta}}_{\backslash\mathcal{H}})$ , the posterior distribution approximation $q({\mathcal{H}})$ can be computed by solving

[TABLE]

In order to solve this equation, we note that differentiating the integral on the right hand side in (22) w.r.t. $q(\mathcal{H})$ results in (see [41], Eq. (2.28))

[TABLE]

where

[TABLE]

We obtain the following iterative procedure to find $q({\boldsymbol{\Theta}})$ by applying this minimization to each unknown in an alternating way

Now we detail the solutions at each step of algorithm (2) explicitly.

IV-B1 Updating intensity field vector ${\mathbf{x}}$

From (24), it can be shown that $q^{k}({\mathbf{x}})$ is an $N_{1}$ -dimensional Gaussian distribution, rewritten as

[TABLE]

where the mean $E_{q^{k}({\mathbf{x}})}({\mathbf{x}})$ and covariance ${\boldsymbol{\Sigma}}_{q^{k}({\mathbf{x}})}({\mathbf{x}})$ of this normal distribution can be calculated from step 3 in Algorithm 2 as

[TABLE]

IV-B2 Updating noise variance $\sigma^{2}$

It is easy to show from (24) that the noise variance follows an inverse-Gamma distribution given by

[TABLE]

whose mean is given by

[TABLE]

where

[TABLE]

where $\text{tr}(.)$ denotes the trace of the matrix.

IV-B3 Updating regularization parameter $\gamma^{2}$

In a similar fashion to noise variance, the regularization parameter $\gamma^{2}$ follows an inverse-Gamma distribution given by

[TABLE]

whose mean is given by

[TABLE]

where

[TABLE]

IV-B4 Updating the hyperparameter $\beta$

The hyperparameter $\beta$ follows a Gamma distribution given by

[TABLE]

whose mean is given by

[TABLE]

In Algorithm 2, no assumptions were imposed on the posterior approximation of $q({\mathbf{x}})$ . We can, however, assume as [28, 29, 30, 31, 42], that this distribution is degenerate, i.e., distribution which takes one value with probability one and the rest of the values with probability zero. We can obtain another algorithm under this assumption which is similar to algorithm 2.

The stopping criterion we use is $\sum_{\mathcal{H}\in\{{\mathbf{x}},\sigma^{2},\beta,\gamma^{2}\}}\lVert\mathcal{H}^{(k)}-\mathcal{H}^{(k+1)}\rVert_{F}\leq\epsilon$ , where $\epsilon=\sqrt{N_{1}}\times 10^{-5}$ [43].

It is clear that using degenerate distribution for $q({\mathbf{x}})$ in Algorithm 3 removes the uncertainty terms of the intensity field estimate. It has been shown that this helps to improve the restoration performance [28, 29, 30, 31, 42]. Moreover, it also reduces the computational complexity as there is no need to compute explicitly the covariance matrix ${\boldsymbol{\Sigma}}_{q^{k}({\mathbf{x}})}({\mathbf{x}})$ at each iteration. Finally, a few remarks are needed to obtain a fast algorithm. The inverse of the covariance matrix ${\boldsymbol{\Delta}}$ needs to be computed only once before the loop in Algorithm 3. We also considered the MATLAB operation $\left(\frac{{\mathbf{H}}^{T}{\mathbf{H}}}{E^{k}(\sigma^{2})}+\frac{{\boldsymbol{\Delta}}^{-1}}{E^{k}(\gamma^{2})}\right)\backslash({\mathbf{H}}^{T}{\mathbf{y}})$ for the update of the intensity field vector ${\mathbf{x}}$ , which is faster than computing the covariance matrix in (27b), then updating the mean in (27a). For very big images, diagonal approximation [29] or conjugate gradient [44] can be considered for the update of the intensity field vector ${\mathbf{x}}$ .

IV-C ADMM algorithm

This section describes another alternative to the MCMC algorithm which is based on an optimization algorithm. The latter maximizes the joint posterior distribution (12) $f({\boldsymbol{\Omega}}|{\mathbf{y}},{\boldsymbol{\phi}})$ with respect to (w.r.t.) the parameters of interest, with fixing the hyperparameter vector ${\boldsymbol{\phi}}$ , to approximate the MAP estimator of ${\boldsymbol{\Theta}}$ , or equivalently, by minimizing the negative log-posterior distribution given by $\mathcal{F}=-\log\left[f({\boldsymbol{\Theta}}|{\mathbf{y}}\right]$ . The resulting optimization problem is tackled using ADMM that sequentially updates the different parameters, which is widely used in the literature for solving imaging inverse problems [43, 45, 19]. We rewrite the model as an optimization problem as follows

[TABLE]

where the regularization function $\phi({\mathbf{x}})$ is proportional to the negative logarithm of the intensity field prior considered in (7) up to an additive constant, i.e. $\phi({\mathbf{x}})=\frac{{\mathbf{x}}^{T}{\boldsymbol{\Delta}}^{-1}{\mathbf{x}}}{2}$ , and $\lambda=\sigma^{2}/\gamma^{2}$ is the regularization parameter. Given this objective function, we write the constrained equivalent formulation as follows

[TABLE]

where ${\mathbf{u}}$ and ${\mathbf{x}}$ are the variables to minimize. In order to solve for ${\mathbf{u}}$ and ${\mathbf{x}}$ , we construct the augmented Lagrangian corresponding to (37) as follows

[TABLE]

where $\mu>0$ is a positive parameter. The ADMM algorithm for solving (38) is shown in Algorithm (4). During each step of the iterative algorithm, $\mathcal{L}$ is optimized w.r.t. ${\mathbf{u}}$ (step 3) and ${\mathbf{x}}$ (step 4) and then the Lagrange multipliers are updated (step 6). The stopping criterion we use is $\lVert{\mathbf{u}}^{(k)}-{\mathbf{x}}^{(k)}\rVert_{F}\leq\epsilon$ , where $\epsilon=\sqrt{N_{1}}\times 10^{-5}$ [43].

V Non-Linear Interpolation Using Gaussian Process Regression

In order to visually view a meaningful image from the deconvolved intensities, we consider non-linear interpolation based on Gaussian processes (GP) [36], since it can provide confidence intervals for each interpolated pixel. A classic choice consists of considering a zero-mean GP with an arbitrary covariance matrix. Here, we choose this covariance matrix to be ${\boldsymbol{\Delta}}^{\prime}={\boldsymbol{\Delta}}/\gamma^{2}$ . Precisely, we interpolate using the prior distribution previously defined in (8). If $d_{n,n^{\prime}}$ is very small, then ${\boldsymbol{\Delta}}^{\prime}(n,n^{\prime})$ approaches its maximum $1/\gamma^{2}$ . If $n$ is distant from $n^{\prime}$ , we have instead ${\boldsymbol{\Delta}}^{\prime}(n,n^{\prime})\approx 0$ , i.e. the two points are considered to be a priori independent. So, for example, during interpolation at new $n_{*}$ location, distant cores will have negligible effect. The amount of spatial correlation depends on the parameters $\ell$ , and $\kappa$ , which are estimated in the way we previously mentioned in section III-B1.

If we consider ${\boldsymbol{\Delta}}^{\prime}({\mathbf{z}},{\mathbf{z}})\in\mathbb{R}^{N_{1}\times N_{1}}$ , ${\mathbf{z}}=[z_{1},\ldots,z_{N_{1}}]^{T}$ contains all the positions of all the observed cores (whose estimated intensities are gathered into ${\mathbf{x}}$ ), and a new spatial location $z_{*}$ for which we want to predict the intensity $x_{*}$ , the GP can be extended as follows

[TABLE]

where ${\boldsymbol{\Delta}}^{\prime}({\mathbf{z}},z_{*})={\boldsymbol{\Delta}}^{\prime}(z_{*},{\mathbf{z}})^{T}\in\mathbb{R}^{N_{1}}$ . Eq. (39) shows that the conditional distribution of each predicted intensity given the previously estimated intensities, follows a Gaussian distribution $x_{*}|{\mathbf{x}}\sim\mathcal{N}\left({\boldsymbol{\mu}},{\boldsymbol{\Sigma}}\right)$ whose mean and variance are given by

[TABLE]

By setting ${\mathbf{x}}=\hat{{\mathbf{x}}}$ , the mean in (40) is finally used to estimate each interpolated intensity, while the variance is used to provide additional information (measure of uncertainty) about the interpolated intensity values.

The Matlab implementations of this paper are provided at https://sites.google.com/site/akeldaly/publications.

VI Simulations Using Synthetic Data

VI-A Data creation

The performance of the proposed methods is investigated by reconstructing a standard test image. A subsampled version of this image is obtained by considering the sampling pattern of an actual endomicroscopy system, as illustrated in Fig. 6. This figure provides an example of a homogeneous region imaged through Alveoflex (Mauna Kea Technologies, France) fiber bundle [46][47]. Such image is used for calibration and to identify the number and positions of the fiber cores. The build-in MATLAB function “vision.BlobAnalysis” was used to detect central fibre core pixels.

Fig. 7 shows the original Lena image (left) and an example of system output (right) after applying the model in Eq. (3). This image is formed by creating a binary mask in which a value of 1 is assigned to pixels corresponding to the central pixels of each core in Fig. 6(b), and zero otherwise. This mask is then multiplied point by point by the Lena image in Fig. 7(a) in order to obtain the subsampled image. The model in Eq. (3) is then applied to obtain an image that simulates the system’s output which is shown in Fig. 7(b). This image is created using subsampled intensities corresponding to 1.29% of the original Lena image. For simulated data, we considered a Gaussian spatial blurring kernel with one size $\sigma^{2}_{\mathbf{C}}=2$ in all the simulations.

VI-B Performance analysis

The performance discriminator adopted in this work to measure the quality of the deconvolved fiber cores is the root mean square error (RMSE), which is computed using intensities at the core locations using

[TABLE]

where ${\mathbf{x}}$ and $\hat{{\mathbf{x}}}$ are vectors of the subsampled reference Lena image and its deconvolved version respectively, and $N_{1}$ is the number of fibre cores.

For synthetic data, in order to check the performance of the algorithm with different cross coupling effects, different values of $\alpha_{\mathbf{H}}$ and $\beta_{\mathbf{H}}$ in (4) can be considered. However, this can be simplified by considering a 2D Gaussian kernel defined by (42)

[TABLE]

since it involves only one variable to change, namely $\sigma^{2}_{\mathbf{H}}$ (representing a squared distance, in pixels). This is equivalent to setting $\beta_{\mathbf{H}}=2\text{ and }\alpha^{2}_{\mathbf{H}}=\alpha^{2}_{\mathbf{H}}/2$ . Note that this simplification is considered only for synthetic data in order to assess the influence of the kernel width. The generalized Gaussian cross coupling kernel ${\mathbf{H}}$ defined in (4) will be considered for real data.

The three methods showed similar results in terms of RMSE and interpolated images. The following shows the VB method’s results. Fig. 8 shows examples of interpolated intensities after deconvolution using GP in the noise-free case ( $\sigma^{2}_{N}=0$ ) and noisy case ( $\sigma^{2}_{N}=10$ ) and different values of $\sigma^{2}_{\mathbf{H}}$ , with the corresponding confidence interval images. we can observe that the structure of the Lena image can be recovered in the two cases. Moreover, in the confidence interval images, we can observe that as we go away from central cores, the confidence interval of the interpolated intensities decreases.

In order to measure the performance of the algorithms, we consider different noise variances ( $\sigma_{N}^{2}$ ) as well as different cross coupling effects ( $\sigma_{\mathbf{H}}^{2}$ ). Fig. 9 shows the RMSE (in log-scale) before and after deconvolution versus $\sigma^{2}_{\mathbf{H}}$ at $\sigma^{2}_{N}=10$ . We can observe that all of the methods are very effective since the RMSE after deconvolution is always lower than that before deconvolution. Moreover, the gain increases with cross coupling.

In order to analyze the effect of noise variance and cross coupling separately, we fix one of them and change the other as shown in Fig. 10. In this figure, we show plots of RMSEs after deconvolution for different $\sigma^{2}_{N}$ at fixed $\sigma^{2}_{\mathbf{H}}$ and vice versa. In Fig. 10(a), we can observe that there is roughly a linear relationship between RMSE and $\sigma^{2}_{N}$ at fixed $\sigma^{2}_{\mathbf{H}}$ . Moreover, the behaviour at $\sigma^{2}_{\mathbf{H}}=1,5,10\text{ and }15$ is almost the same. In Fig. 10(b), we can observe that RMSE is fairly constant as $\sigma^{2}_{\mathbf{H}}$ increases at constant $\sigma^{2}_{N}$ . Furthermore, it starts to increase as $\sigma^{2}_{N}$ increases but still remains constant when changing $\sigma^{2}_{\mathbf{H}}$ .

For the MCMC method, in all of the simulations in this paper including the real datasets, $N_{\text{MC}}=1500$ , including $N_{\text{bi}}=500$ , which were determined visually from preliminary runs, were used. For the ADMM method, different regularization parameter values are tested, we pick up the one corresponding to the lowest RMSE.

VI-C Comparison

In this section, we compare the three proposed methods for deconvolution and restoration of OEM images. The comparison is conducted in terms of RMSE before and after deconvolution, as well as in terms of computation time.

Fig. 11 compares RMSEs after deconvolution versus different $\sigma^{2}_{N}$ as well as different $\sigma^{2}_{\mathbf{H}}$ . We can observe that for all of the methods, as $\sigma^{2}_{N}$ increases at constant $\sigma^{2}_{\mathbf{H}}$ , RMSE increases. On the other hand, at fixed $\sigma^{2}_{N}$ , RMSE seems to be roughly constant for $\sigma^{2}_{\mathbf{H}}=1,5,\text{ and }10$ , then, it starts to increase as $\sigma^{2}_{\mathbf{H}}$ increases. It is clear that all the methods behave similarly in terms of RMSE.

Table I shows the average computation time (in seconds) of the three proposed methods. The experiments were conducted on ACER core-i3-2.0 GHz processor laptop with 8 GB RAM. It is clear that the MCMC method is the most computationally expensive method. The ADMM method is second, and the VB the least. Despite the relatively high computation time of the MCMC method, it is a parameter free method compared to the ADMM-based method in which the regularization parameter $\lambda$ should be chosen carefully. The VB approach is considered to be the best compared to MCMC and ADMM, it can provide similar RMSE but with lower computation complexity, moreover, it is fully automatic in the sense that it can estimate the hyperparameters associated with the parameters as mentioned previously in section IV-B.

Although the MCMC and ADMM algorithms can estimate the noise variance and model hyperparameters, in practice these parameters are very difficult to estimate accurately, (specifically $\sigma^{2}$ and $\gamma^{2}$ ) due to the similarity between ${\mathbf{H}}^{T}{\mathbf{H}}$ and ${\boldsymbol{\Delta}}^{-1}$ in (15b) and (27b). Therefore, we have to make an informed choice about one of these parameters, specifically the choice of the hyperparameters $\alpha$ , $\alpha_{0}$ and $\beta_{0}$ in (9) and (10). In Fig. 10(b), we observe that the RMSEs in practise are close to the true noise standard deviation, and hence the noise variance can be inferred.

VI-D Robustness

To test the robustness of the proposed methods, we create the data using a specific $\sigma^{2}_{H}$ and we deconvolve using different values. Following this strategy, we create the data using $\sigma^{2}_{H}=10$ and we deconvolve using $\sigma^{2}_{H}=6,8,10,12,\text{ and }14$ . The three estimation approaches showed similar results.

Fig. 12 shows plots of RMSE after deconvolution versus $\sigma^{2}_{N}$ at fixed $\sigma^{2}_{\mathbf{H}}$ and vice versa. In Fig. 12(a), we can observe that the noise variance has no effect on the deconvolution in the tested interval as RMSE is constant at fixed $\sigma^{2}_{\mathbf{H}}$ . In Fig. 12(b), there is an approximately linear relationship between RMSE and $\sigma^{2}_{\mathbf{H}}$ at constant $\sigma^{2}_{N}$ . Furthermore, lower values of $\sigma^{2}_{H}$ than the one we created the data with (i.e., $\sigma^{2}_{\mathbf{H}}=6\text{ and }8$ ) yield lower RMSE than higher ones (i.e., $\sigma^{2}_{\mathbf{H}}=12\text{ and }14$ ). In other words, it is slightly better to underestimate $\sigma^{2}_{\mathbf{H}}$ than to overestimate it.

We observe that deconvolution using the value we created the data with ( $\sigma^{2}_{\mathbf{H}}=10$ ) yields the minimum RMSE. Moreover, RMSE after deconvolution is always lower than that before deconvolution except for $\sigma^{2}_{\mathbf{H}}=14$ at which it is higher.

VII Simulations Using Real Data

The performance of the proposed methods has been evaluated on two real datasets; the 1951 USAF resolution test chart and ex vivo human lung tissue. Both of them were collected using OEM system [7] with monochrome detection (Grasshopper3 camera GS3-U3-23S6M-C, Point Grey Research, Canada) and 470 nm LED illumination (M470L3, Thorlabs Ltd, UK) for lung autofluorescence excitation. Excised human lung tissue was placed in a well plate. Human tissue was used with regional ethics committee (REC: 13/ES/0126) approval and was retrieved from the periphery of specimens taken from lung cancer resections. In order to adjust the cross coupling kernel parameters $\alpha_{\mathbf{H}}\text{ and }\beta_{\mathbf{H}}$ , a study was performed to measure, analyze and quantify inter-core coupling within coherent fibre bundles [10]. This study showed how light is spread over the neighbouring cores, and gave statistical analysis on coupling percent in neighbouring cores. It showed that around 61% of transmitted light remains in the central core, around 34% in the first neighbouring cores, around 4% in the second neighbouring cores, and less than 1% in the third, fourth and fifth neighbouring cores. This leads to fixing $\alpha_{\mathbf{H}}=4$ (in pixels) and $\beta_{\mathbf{H}}=0.8$ .

VII-A 1951 USAF resolution test chart

The 1951 USAF chart is a resolution test pattern set by US Air Force in 1951. It is widely accepted to test the resolution of optical imaging systems such as microscopes, cameras and image scanners [48]. Fig. 13 (a) shows the original USAF resolution test chart used in the project. The resulting image obtained by fiber bundle is shown in Fig. 13 (b) with image size $760\times 760$ and is composed of 7,776 fiber cores ( $1.34\%$ of the image).

A non-linear interpolation based on GP of central core intensities of the image in Fig. 13(b) is presented in Fig.14(a), with the corresponding confidence intervals image in Fig.14(c). We can observe the blurring which is caused by the cross coupling effect as well as the sparsity of the data.

The outputs of the MCMC, VB, and ADMM algorithms are very similar. Thus, we show the results of the VB method. Fig. 14(b) shows an example of one of the output images with the corresponding confidence intervals in Fig. 14(d). The set of ticker strips (top left corner of the image) is now better resolved and the overlap between them is reduced. The small set of strips which is at the bottom could not be resolved, which gives an indication about the resolving resolution of this endomicroscopy system. Regions of high uncertainty (which appear as blobs in dark red) are where there may be no cores or they are dead, this in addition to the irregular core sampling are the reasons for some strips appear a bit fragmented.

VII-B Ex vivo human lung tissues

Fig. 15(a) shows the output image of the OEM system. Image size is $1000\times 800$ and is composed of 13,343 fiber cores ( $1.66\%$ of the image). Non-linear interpolation based on GP of central core intensities is presented in Fig.15(b). Similar to the USAF resolution test chart, we aim at reducing cross coupling effect as well as getting a more resolved image.

Similar to the USAF resolution test chart results, the outputs of the MCMC, VB, and ADMM algorithms are very similar. We only show the results of the VB method. Fig. 15(c) shows an example of interpolated deconvolved samples using GP. The lung structure is now better resolved and more sharper than before deconvolution. Moreover, confidence intervals are shown in Fig. 15(d). We can observe that as we move away from the central cores, the confidence of the interpolated intensities decreases and vice versa.

Table II provides the computation time of the 1951 USAF resolution test chart and the ex vivo lung tissue image. It is clear that the VB is still the fastest despite the change of the images size.

VIII Conclusion and Future Work

This paper introduced a hierarchical Bayesian model and three estimation algorithms for the deconvolution of optical endomicroscopy images. The deconvolution accounts and compensates for fibre core cross coupling which causes major image degradation in this type of imaging. The resulting joint posterior distribution was used to approximate the Bayesian estimators. First, a Markov chain Monte Carlo procedure based on a Gibbs sampler algorithm was used to sample the posterior distribution of interest and to approximate the MMSE estimators of the unknown parameters using the generated samples. Second, a variational Bayes approach to approximate the joint posterior distribution by minimizing the Kullback-Leibler divergence was used. Third, an approach based on an alternating direction method of multipliers was used to approximate the maximum a posteriori estimators. The three algorithms showed similar estimation performance while providing different characteristics, the MCMC and VB based approaches are fully automatic in the sense that they can jointly estimate the hyperparameters associated with the priors, however, the MCMC based approach showed high computational complexity which could be overcome by the VB and ADMM approaches. Although the ADMM approach has low computational complexity, it is semi-supervised in the sense that the hyperparameters associated with the priors need to be chosen carefully by the user. A non-linear interpolation approach based on Gaussian processes was considered to restore the full images from the samples to provide a meaningful image for interpretation. In the future, we will consider temporal information while deconvolving. Accounting for the different core sizes is also clearly an interesting route currently under investigation.

Acknowledgement

This work was supported in parts by the Engineering and Physical Sciences Research Council (EPSRC, United Kingdom) Interdisciplinary Research Collaboration grant EP/K03197X/1 and by the Royal Academy of Engineering under the Research Fellowship scheme (RF201617/16/31). We would like to thank the reviewers for their helpful comments that helped in improving the quality of the manuscript.

Bibliography48

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Chastre and J.-Y. Fagon, “Ventilator-associated pneumonia,” American journal of respiratory and critical care medicine , vol. 165, no. 7, pp. 867–903, 2002.
2[2] P. Johnston, D. F. Mc Auley, and C. M. O’Kane, “Novel pulmonary biomarkers in the diagnosis of vap,” Thorax , vol. 65, no. 3, pp. 190–192, 2010.
3[3] V. S. Baselski and R. G. Wunderink, “Bronchoscopic diagnosis of pneumonia.” Clinical microbiology reviews , vol. 7, no. 4, pp. 533–558, 1994.
4[4] M. Pierce, D. Yu, and R. Richards-Kortum, “High-resolution fiber-optic microendoscopy for in situ cellular imaging,” Journal of visualized experiments: Jo VE , no. 47, 2011.
5[5] D. Shin, M. C. Pierce, A. M. Gillenwater, M. D. Williams, and R. R. Richards-Kortum, “A fiber-optic fluorescence microscope using a consumer-grade digital camera for in vivo cellular imaging,” P Lo S One , vol. 5, no. 6, p. e 11218, 2010.
6[6] X. Hong, V. K. Nagarajan, D. H. Mugler, and B. Yu, “Smartphone microendoscopy for high resolution fluorescence imaging,” Journal of Innovative Optical Health Sciences , vol. 9, no. 05, p. 1650046, 2016.
7[7] N. Krstajić, A. R. Akram, T. R. Choudhary, N. Mc Donald, M. G. Tanner, E. Pedretti, P. A. Dalgarno, E. Scholefield, J. M. Girkin, A. Moore et al. , “Two-color widefield fluorescence microendoscopy enables multiplexed molecular imaging in the alveolar space of human lung tissue,” Journal of Biomedical Optics , vol. 21, no. 4, pp. 046 009–046 009, 2016.
8[8] H. A. Wood, K. Harrington, J. M. Stone, T. A. Birks, and J. C. Knight, “Quantitative characterization of endoscopic imaging fibers,” Optics Express , vol. 25, no. 3, pp. 1985–1992, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Deconvolution and Restoration of Optical Endomicroscopy Images

Abstract

Index Terms:

I Introduction

II Problem Formulation

III Hierarchical Bayesian Model

III-A Likelihood

III-B Parameter Priors

III-B1 Prior for the underlying intensity field x{\mathbf{x}}x

III-B2 Prior for the noise variance σ2\sigma^{2}σ2

III-B3 Prior for the hyperparameter β\betaβ

III-B4 Prior for the hyperparameter γ2\gamma^{2}γ2

III-C Joint posterior distribution

IV Bayesian Inference

IV-A MCMC algorithm

IV-A1 Sampling the intensity field x{\mathbf{x}}x

IV-A2 Sampling the noise variance σ2\sigma^{2}σ2

IV-A3 Sampling the hyperparameters β\betaβ and γ2\gamma^{2}γ2

IV-B Variational Bayes algorithm

IV-B1 Updating intensity field vector x{\mathbf{x}}x

IV-B2 Updating noise variance σ2\sigma^{2}σ2

IV-B3 Updating regularization parameter γ2\gamma^{2}γ2

IV-B4 Updating the hyperparameter β\betaβ

IV-C ADMM algorithm

V Non-Linear Interpolation Using Gaussian Process Regression

VI Simulations Using Synthetic Data

VI-A Data creation

VI-B Performance analysis

VI-C Comparison

VI-D Robustness

VII Simulations Using Real Data

VII-A 1951 USAF resolution test chart

VII-B Ex vivo human lung tissues

VIII Conclusion and Future Work

Acknowledgement

III-B1 Prior for the underlying intensity field ${\mathbf{x}}$

III-B2 Prior for the noise variance $\sigma^{2}$

III-B3 Prior for the hyperparameter $\beta$

III-B4 Prior for the hyperparameter $\gamma^{2}$

IV-A1 Sampling the intensity field ${\mathbf{x}}$

IV-A2 Sampling the noise variance $\sigma^{2}$

IV-A3 Sampling the hyperparameters $\beta$ and $\gamma^{2}$

IV-B1 Updating intensity field vector ${\mathbf{x}}$

IV-B2 Updating noise variance $\sigma^{2}$

IV-B3 Updating regularization parameter $\gamma^{2}$

IV-B4 Updating the hyperparameter $\beta$