Framelet Pooling Aided Deep Learning Network : The Method to Process   High Dimensional Medical Data

Chang Min Hyun; Kang Cheol Kim; Hyun Cheol Cho; Jae Kyu Choi; Jin; Keun Seo

arXiv:1907.10834·cs.LG·July 29, 2019

Framelet Pooling Aided Deep Learning Network : The Method to Process High Dimensional Medical Data

Chang Min Hyun, Kang Cheol Kim, Hyun Cheol Cho, Jae Kyu Choi, Jin, Keun Seo

PDF

TL;DR

This paper introduces a framelet-pooling method that transforms high-dimensional medical data into low-dimensional components, significantly reducing computational costs while maintaining learning performance.

Contribution

The paper presents a novel framelet-pooling approach that mitigates computational complexity in deep learning for high-dimensional medical data, enabling efficient processing.

Findings

01

Reduces computational costs by decomposing large data into smaller tasks

02

Maintains comparable accuracy to standard methods

03

Significantly decreases neural network complexity

Abstract

Machine learning-based analysis of medical images often faces several hurdles, such as the lack of training data, the curse of dimensionality problem, and the generalization issues. One of the main difficulties is that there exists computational cost problem in dealing with input data of large size matrices which represent medical images. The purpose of this paper is to introduce a framelet-pooling aided deep learning method for mitigating computational bundle, caused by large dimensionality. By transforming high dimensional data into low dimensional components by filter banks with preserving detailed information, the proposed method aims to reduce the complexity of the neural network and computational costs significantly during the learning process. Various experiments show that our method is comparable to the standard unreduced learning method, while reducing computational burdens by…

Tables3

Table 1. Table 1: Table of the average computational time per epoch in the undersampled MRI problem.

$#$ of training data	$𝕌^{(0)} -NET$	$𝕌^{(1)} -NET$			$𝕌^{(2)} -NET$
$#$ of training data	$𝕌^{(0)} -NET$	Haar	Db4	PL	Haar	Db4	PL
1500	11.79803	4.66432	4.43048	8.32320	3.63646	3.75758	12.98898
1000	7.65065	3.43165	3.25852	5.61659	2.44117	2.53221	8.312133
500	3.76303	1.70458	1.75337	2.65404	1.27335	1.31174	4.67596
100	0.71829	0.33526	0.33408	0.54698	0.22354	0.25795	0.94070

Table 2. Table 3: Table of the average computational time per epoch in the sparse-view CT problem.

$#$ of training data	$𝕌^{(0)} -NET$	$𝕌^{(1)} -NET$			$𝕌^{(2)} -NET$
$#$ of training data	$𝕌^{(0)} -NET$	Haar	Db4	PL	Haar	Db4
1500	39.47637	17.29248	18.34227	31.42237	12.09981	11.75551
1000	26.42527	11.80329	12.34189	20.91642	8.27648	8.21492
500	13.00667	6.00089	6.06389	10.54422	4.10279	4.03506
100	2.46522	1.04074	1.12465	1.95324	0.75294	0.81197

Table 3. Table 5: Table of the average computational time per epoch in undersampled MRI problem, when using the proposed method with PL framelet and 1000 training data(N=1000).

Feature depth	$𝕌^{(0)} -NET$	$𝕌^{(1)} -NET$	$𝕌^{(2)} -NET$
16	7.65065	5.61659	8.312133
32	12.78046	6.186188	8.634091
64	26.20805	8.805362	8.979816

Equations44

P^{♯} = S P

P^{♯} = S P

P = F y = \int_{R^{2}} y (z) e^{- 2 π i z \cdot ξ} d z

P = F y = \int_{R^{2}} y (z) e^{- 2 π i z \cdot ξ} d z

y = F^{- 1} P

y = F^{- 1} P

y^{♯} = F^{- 1} S^{*} P^{♯}

y^{♯} = F^{- 1} S^{*} P^{♯}

P = R y = \int_{L_{θ, s}} y (z) d ℓ_{z}

P = R y = \int_{L_{θ, s}} y (z) d ℓ_{z}

y = R^{- 1} P

y = R^{- 1} P

y^{♯} = R^{- 1} S^{*} P^{♯}

y^{♯} = R^{- 1} S^{*} P^{♯}

f = f \in D L_{\mbox n e t} \mbox a r g min i = 1 \sum N L (f (x^{(i)}), y^{(i)})

f = f \in D L_{\mbox n e t} \mbox a r g min i = 1 \sum N L (f (x^{(i)}), y^{(i)})

ϕ (ξ) = q_{0} (2^{- 1} ξ) ϕ (2^{- 1} ξ), \forall ξ \in R^{2}

ϕ (ξ) = q_{0} (2^{- 1} ξ) ϕ (2^{- 1} ξ), \forall ξ \in R^{2}

ψ_{α} (ξ) = q_{α} (2^{- 1} ξ) ϕ (2^{- 1} ξ), \forall ξ \in [0, π]^{2} & 1 \leq α \leq r

ψ_{α} (ξ) = q_{α} (2^{- 1} ξ) ϕ (2^{- 1} ξ), \forall ξ \in [0, π]^{2} & 1 \leq α \leq r

α = 0 \sum r ∣ q_{α} (ξ) ∣^{2} = 1, α = 0 \sum r q_{α} (ξ) \overline{q_{α} (ξ + ν)} = 0,

α = 0 \sum r ∣ q_{α} (ξ) ∣^{2} = 1, α = 0 \sum r q_{α} (ξ) \overline{q_{α} (ξ + ν)} = 0,

\forall ξ \in [0, π]^{2}, \forall ν \in {0, π}^{2} \ {0} .

\mathpzc W^{(1)} = [\mathpzc W_{0, 0}^{T}, \mathpzc W_{0, 1}^{T}, \dots, \mathpzc W_{0, r}^{T}]^{T}

\mathpzc W^{(1)} = [\mathpzc W_{0, 0}^{T}, \mathpzc W_{0, 1}^{T}, \dots, \mathpzc W_{0, r}^{T}]^{T}

\mathpzc W_{0, α} x =↓ (x ⊛ q_{α} (- \cdot)), \forall x \in R^{d^{2}} .

\mathpzc W_{0, α} x =↓ (x ⊛ q_{α} (- \cdot)), \forall x \in R^{d^{2}} .

\mathpzc W^{(2)} = [

\mathpzc W^{(2)} = [

\dots, (\mathpzc W_{1, r} \mathpzc W_{0, 0})^{T}, \dots, (\mathpzc W_{1, r} \mathpzc W_{0, r})^{T}]^{T}

\mathpzc W_{1, α} \tilde{x} =↓ (\tilde{x} ⊛ q_{α} (- \cdot)), \forall \tilde{x} \in R^{d^{2} / 2^{- 2}} .

\mathpzc W_{1, α} \tilde{x} =↓ (\tilde{x} ⊛ q_{α} (- \cdot)), \forall \tilde{x} \in R^{d^{2} / 2^{- 2}} .

f = f \in D L_{\mbox n e t} \mbox a r g min i = 1 \sum N L (f (\mathpzc W^{(k_{1})} x^{(i)}), W^{(k_{2})} y^{(i)})

f = f \in D L_{\mbox n e t} \mbox a r g min i = 1 \sum N L (f (\mathpzc W^{(k_{1})} x^{(i)}), W^{(k_{2})} y^{(i)})

\mathbf{x}_{\mbox{\tiny MR}}^{(i)}=\mathscr{F}^{-1}\mathcal{S}^{*}\underbrace{\mathcal{S}\mathscr{F}\mathbf{y}_{\mbox{\tiny MR}}^{(i)}}_{\mbox{\footnotesize$\textbf{P}^{\sharp}$}}

\mathbf{x}_{\mbox{\tiny MR}}^{(i)}=\mathscr{F}^{-1}\mathcal{S}^{*}\underbrace{\mathcal{S}\mathscr{F}\mathbf{y}_{\mbox{\tiny MR}}^{(i)}}_{\mbox{\footnotesize$\textbf{P}^{\sharp}$}}

{\mathpzc W^{(k)} x_{\mbox M R}^{(i)}, \mathpzc W^{(k)} y_{\mbox M R}^{(i)}}_{i = 1}^{N}

{\mathpzc W^{(k)} x_{\mbox M R}^{(i)}, \mathpzc W^{(k)} y_{\mbox M R}^{(i)}}_{i = 1}^{N}

\mathbf{x}_{\mbox{\tiny CT}}^{(i)}=\mathscr{R}^{-1}\mathcal{S}^{*}\underbrace{\mathcal{S}\mathscr{R}\mathbf{y}_{\mbox{\tiny CT}}^{(i)}}_{\mbox{\footnotesize$\textbf{P}^{\sharp}$}}

\mathbf{x}_{\mbox{\tiny CT}}^{(i)}=\mathscr{R}^{-1}\mathcal{S}^{*}\underbrace{\mathcal{S}\mathscr{R}\mathbf{y}_{\mbox{\tiny CT}}^{(i)}}_{\mbox{\footnotesize$\textbf{P}^{\sharp}$}}

{\mathpzc W^{(k)} x_{\mbox C T}^{(i)}, \mathpzc W^{(k)} y_{\mbox C T}^{(i)}}_{i = 1}^{N}

{\mathpzc W^{(k)} x_{\mbox C T}^{(i)}, \mathpzc W^{(k)} y_{\mbox C T}^{(i)}}_{i = 1}^{N}

h^{(1)} = \mbox R e LU (C^{(1)} ⊛_{1} \mathpzc W^{(k)} x) s

h^{(1)} = \mbox R e LU (C^{(1)} ⊛_{1} \mathpzc W^{(k)} x) s

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Framelet Pooling Aided Deep Learning Network : The Method to Process High Dimensional Medical Data

Chang Min Hyun†, Kang Cheol Kim†, Hyun Cheol Cho†, Jae Kyu Choi‡555To whom correspondence should be addressed ([email protected]) and Jin Keun Seo

†Department of Computational Science and Engineering, Yonsei University, Seoul, Korea

‡School of Mathematical Sciences, Tongji University, Shanghai, 200092, China

Abstract

Machine learning-based analysis of medical images often faces several hurdles, such as the lack of training data, the curse of dimensionality problem, and the generalization issues. One of the main difficulties is that there exists computational cost problem in dealing with input data of large size matrices which represent medical images. The purpose of this paper is to introduce a framelet-pooling aided deep learning method for mitigating computational bundle, caused by large dimensionality. By transforming high dimensional data into low dimensional components by filter banks with preserving detailed information, the proposed method aims to reduce the complexity of the neural network and computational costs significantly during the learning process. Various experiments show that our method is comparable to the standard unreduced learning method, while reducing computational burdens by decomposing large-sized learning tasks into several small-scale learning tasks.

1 Introduction

Recently, medical imaging is experiencing a paradigm shift due to a remarkable and rapid advance in deep learning techniques. Deep learning techniques have expanded our ability by sophisticated “disentangled representation learning” through training data, and appear to show superiority of performance in various medical imaging problems including undersampled magnetic resonance imaging(MRI), sparse-view computed tomography(CT), artifact reduction, organ segmentation, and automated disease detection. In particular, U-net [Ronneberger2015], a kind of convolutional neural network, seems to show remarkable capability of learning image representations. However, there are some hurdles to overcome, one of which comes from the high dimensionality, i.e. the high resolution or the large size, of medical images. This paper addresses a way to resolve this issue through a so-called framelet pooling aided deep learning network.

Machine learning performance is closely related to the number, the quality, and the pixel dimensionality of the sampled data. For ease of explanation, let us consider a simple question to learn an unknown function $f:[0,1]^{d}\mapsto[0,1]$ from a given sample $(\mathbf{x},y)$ , where $\mathbf{x}$ is an input gray scale image lying in $[0,1]^{d}$ and $y=f(\mathbf{x})$ is the corresponding output on the interval $[0,1]$ . Then one can ask how many training samples are needed to approximate $f$ with a given tolerance $\epsilon>0$ . It is well-known that for Lipschitz continuous function $f$ , we need to sample $O(\epsilon^{-d})$ points [mallat2016]. In addition, the author in [barron1994] observed that the estimation error of the function $f$ by 1 hidden layer neural networks is given by $O(\frac{c_{f}}{\mathfrak{m}})+O\left(\frac{\mathfrak{m}d}{\mathfrak{n}_{\mbox{\tiny data}}}\log\mathfrak{n}_{\mbox{\tiny data}}\right)$ , where $\mathfrak{n}_{\mbox{\tiny data}}$ is the number of training data, $\mathfrak{m}$ is the number of neurons in the hidden layer, and $c_{f}$ is a constant depending on the regularity of $f$ . This means that in the case of $d=512^{2}$ (i.e. considering $512\times 512$ images) and $\mathfrak{m}=d$ , we roughly need huge training data $\mathfrak{n}_{\mbox{\tiny data}}=O(10^{12})$ to achieve the error of $O(10^{-1})$ . This high number of required training data makes the problem intractable, especially when data lies in the high dimensional space. Such a phenomenon is referred as the curse-of-dimensionality in approximation sense. Even though the effect of dimensionality on deep networks is relatively weaker than shallow ones [bruna2013, pascanu2013, brainMIT2016] in approximation sense, deep learning requires huge computational scale for training process. Thus, deep networks with high dimensional data also experience the curse-of-dimensionality in terms of computational burden.

In the literature, framelets are known to be effective in capturing key information of images. This is due to the multiscale structure of the framelet systems, and the presence of both low pass and high pass filters in the filter banks, which are desirable in sparsely approximating images without loss of information [bin2017]. In this work, we propose a framelet-based deep learning method to reduce computational burdens for dealing with high dimensional data in the learning process. This method, called a framelet pooling, is based on the decomposition of a $d$ -dimensional input-output pair $(\mathbf{x},\mathbf{y})$ into several $d/2^{2k}$ -dimensional pairs $\{(\mathpzc{W}_{k,\alpha}\mathbf{x},\mathcal{W}_{k,\alpha}\mathbf{y}):\alpha=1,\cdots,r\}$ , where each $\mathpzc{W}_{k,\alpha}$ and $\mathcal{W}_{k,\alpha}$ are $d/2^{2k}\times d$ matrices corresponding to $k$ th level framelet packet transform [mallat2009]. Instead of learning the pair of high dimensional original data $(\mathbf{x},\mathbf{y})$ , the proposed method tries to learn much lower dimensional pairs $(\mathcal{W}_{k,\alpha}\mathbf{x},\mathpzc{W}_{k,\alpha}\mathbf{y})$ in parallel passion, so that we can achieve the computational efficiency in dealing with the large size images.

As an application of our proposed method, we deal with the undersampled MRI [Hyun2018] and the sparse-view CT problem [jin2017], where huge memory problems may arise in recovering high resolution images. Experiments on undersampled MRI and sparse-view CT show that our framelet pooling aided reduced method provides very similar performance to the standard unreduced method, while reducing the computation time greatly by reducing the dimension of inputs and learning parameters in neural networks.

2 Method

Both undersampled MRI and sparse-view CT problem aim to find a reconstruction function $f$ , which maps from an undersampled data $\mathbf{P}^{\sharp}$ (violating Nyquist criteria) to a clinically meaningful tomographic image $\mathbf{y}$ . Here, the undersampled data $\textbf{P}^{\sharp}$ can be expressed as the subsampling of the fully-sampled data $\mathbf{P}$ (satisfying the Nyquist criterion),

[TABLE]

where $\mathcal{S}$ is a subsampling operator. The standard MRI and CT use the fully-sampled data $\mathbf{P}$ to provide tomographic images, where the reconstruction functions $f$ in MRI and CT are the inverse Fourier transform and inverse Radon transform, respectively. However, when we use the undersampled data $\textbf{P}^{\sharp}$ , these standard methods do not work as the Nyquist criterion is not satisfied any more. (See Fig. 1 and Fig. 2.) For the sake of clarity, we shall briefly state the mathematical framework of undersampled MRI and sparse-view CT in the following subsections.

2.1 Undersampled MRI

Let $\mathbf{y}(z)$ be a distribution of nuclear spin density at the position $z=(z_{1},z_{2})$ . The measured k-space data P is governed by the Fourier relation,

[TABLE]

where $\xi=(\xi_{1},\xi_{2})$ [Nishimura2010]. Therefore, with the fully-sampled data P, the reconstruction image $\mathbf{y}$ can be obtained by taking the inverse Fourier transform to the measured data P,

[TABLE]

Note that the direct inversion method (3) can also be applied to the undersampled data $\textbf{P}^{\sharp}$ ,

[TABLE]

Here, $\mathcal{S}^{*}$ is an adjoint operator of $\mathcal{S}$ in the $\ell^{2}$ space. However, the image $\mathbf{y}^{\sharp}$ obtained from (4) contains aliasing artifacts as $\textbf{P}^{\sharp}$ violates the Nyquist criterion(See Fig. 1).

2.2 Sparse-view CT

In CT, the tomographic image $\mathbf{y}(z)$ can be regarded as the distribution of linear attenuation coefficients at the position $z=(z_{1},z_{2})$ . For CT data acquisition, X-ray beams are transmitted at various directions $\theta:=(\cos\varphi,\sin\varphi),~{}~{}0\leq\varphi\leq 2\pi$ . Under the assumption of monochromatic X-ray generation, the projection data P at the direction $\theta$ is dictated by the following Radon transform

[TABLE]

where $L_{\theta,s}$ is the projection line $L_{\theta,s}:=\{z\in{\mathbb{R}}^{2}~{}:~{}\theta\cdot z=s\}$ [seo2013]. With the fully-sampled data P satisfying the Nyquist criterion, $\mathbf{y}$ can be reconstructed by the inverse Radon transform

[TABLE]

For the undersampled data $\textbf{P}^{\sharp}$ , which is measured with the low sampling frequency along the projection-view, we can apply the direct inversion formula (6) by filling zeros to unmeasured parts of undersampled data

[TABLE]

However, the reconstruction image $\mathbf{y}^{\sharp}$ contains streaking artifacts, which result from the violation of Nyquist criterion. Fig. 2 shows the schematic and visual descriptions of the sparse-view CT problem.

2.3 Main result: Undersampled reconstruction using framelet and deep learning

The objective of the undersampled reconstruction problem is to develop a deartifacting map $f$ , which converts $\mathbf{y}^{\sharp}\in\mathbb{R}^{d^{2}}$ (artifacted image) to $\mathbf{y}\in\mathbb{R}^{d^{2}}$ (artifact removed image) with $d^{2}$ being a pixel dimension of reconstructed image. In particular, deep learning techniques, such as U-net, infer $f$ by minimizing training data-fidelity :

[TABLE]

using a set of training data $(\mathbf{x}^{(i)},\mathbf{y}^{(i)})_{i=1}^{N}$ . Here, $N$ is the number of training data, $\mathbf{x}^{(i)}$ denotes the artifact image instead of $(\mathbf{y}^{\sharp})^{(i)}$ , $\mathbb{DL}_{\tiny\mbox{net}}$ is a set of all learnable functions from a user-defined deep learning network architecture, and $\mathscr{L}$ is a user-defined energy-loss function to evaluate the metric between deep learning output $f(\mathbf{x}^{(i)})$ and label $\mathbf{y}^{(i)}$ . However, if the pixel dimension of input increases, the total computational complexity in the training process increase largely. To address this curse-of-dimensionality issue, we propose the framelet pooling aided deep learning method to learn the deartifacting map $f$ indirectly.

For sake of clarity, we first provide brief introduction to the framelet. Let $\phi\in\mbox{L}^{2}(\mathbb{R}^{2})$ be a refinable function, which satisfies

[TABLE]

where $\widehat{\cdot}$ denotes the Fourier transform operator. Then, framelet functions $\Psi=\{\psi_{\alpha}:1\leq\alpha\leq r\}\subseteq\mbox{L}^{2}(\mathbb{R}^{2})$ is generated by

[TABLE]

where $\textbf{q}_{\alpha}$ satisfies the unitary extension principle [ron1997, bin2012],

[TABLE]

Then the corresponding affine system $X(\Psi)=\{\psi_{\alpha,n,\textbf{k}}=2^{n}\psi_{\alpha}(2^{n}\cdot-\textbf{k}):1\leq\alpha\leq r,n\in\mathbb{Z},\textbf{k}\in\mathbb{Z}^{2}\}$ forms a tight frame for $\mbox{L}^{2}(\mathbb{R}^{2})$ , and the filter banks $\{\textbf{q}_{\alpha}\}_{i\in\{0,1,\cdots,r\}}$ form a tight frame on $\ell^{2}(\mathbb{Z}^{2})$ . This means that these filter banks project high dimensional data into low dimensional space without any information loss. In other words, an invertible decomposition procedure, called framelet decomposition, can be defined from these filter banks.

Now, we define the first level framelet decomposition operator $\mathpzc{W}^{(1)}$ by

[TABLE]

where $\mathpzc{W}_{0,\alpha}$ is the $d^{2}/2^{-2}\times d^{2}$ matrix given by

[TABLE]

Here, $\downarrow$ stands for 2 dimensional down-sampling operator and $\circledast$ is convolution operator with stride 1. Likewise, we can define the second level framelet decompoosition $\mathpzc{W}^{(2)}$ by

[TABLE]

where $\mathpzc{W}_{1,\alpha}$ is the $d^{2}/2^{-4}\times d^{2}/2^{-2}$ matrix given by

[TABLE]

We can continue the above process to define the $k$ th level framelet decomposition operator $\mathpzc{W}^{(k)}$ . Fig. 3 illustrates two examples of framelet decompositions using Daubechies wavelet(db4) [Daubechies1988] and piecewise linear B-spline frame [Shen2013].

Now, we are ready to explain our proposed deep learning network. Let $\mathpzc{W}$ and $\mathcal{W}$ be framelet decomposition operators. The proposed framelet pooling deep learning network aims to infer the relation between $\mathpzc{W}^{(k_{1})}\mathbf{x}$ and $\mathcal{W}^{(k_{2})}\mathbf{y}$ in the following least-squared minimization sense :

[TABLE]

Here, each $(\mathpzc{W}^{(k_{1})}\mathbf{x}^{(i)})_{\alpha_{1}}$ and $(\mathcal{W}^{(k_{2})}\mathbf{y}^{(i)})_{\alpha_{2}}$ are images with $d^{2}/2^{-2k_{1}}$ and $d^{2}/2^{-2k_{2}}$ pixel dimension, respectively. For example, let $\mathcal{W}^{(2)}$ be the second level Daubechies 4 tab wavelet decomposition. If the second level Daubechies 4 tab wavelet decomposition is taken for $\mathpzc{W}^{(k_{1})}$ and $\mathcal{W}^{(k_{2})}$ in the equation $\eqref{FADLN}$ , the proposed deep learning method tries to find the function f satisfying $\textbf{f}(\mathcal{W}^{(2)}\mathbf{x})=\mathcal{W}^{(2)}\mathbf{y}$ in the sense of $\eqref{FADLN}$ , as shown in Fig 4.

Compared to the direct deep learning scheme $\eqref{directDL}$ , the framelet-pooling aided deep learning method $\eqref{FADLN}$ is expected to mitigate the total computational complexity and time caused by high dimensional data in the learning process. In this paper, we test only the case that training inputs and labels are decomposed using same framelet decomposition $\mathpzc{W}^{(k)}$ . However, our method is not restricted only in this specific case.

3 Experiments and Results

3.1 Experimental Set-up for Undersampled MRI

Let $\{\mathbf{y}_{\mbox{\tiny MR}}^{(i)}\in\mathbb{R}^{256\times 256}\}_{i=1}^{N}$ denote the set of MR images reconstructed with the Nyquist sampling. Using $\{\mathbf{y}_{\mbox{\tiny MR}}^{(i)}\}$ , we compute the training input $\{\mathbf{x}_{\mbox{\tiny MR}}^{(i)}\in\mathbb{R}^{256\times 256}\}_{i=1}^{N}$ by

[TABLE]

where $\mathscr{F}$ is the 2D discrete Fourier transform, $\mathscr{F}^{-1}$ is the 2 dimensional discrete inverse Fourier transform, and $\mathcal{S}$ is a specifically user-chosen subsampling operator. In our experiments, we use the MR images $\mathbf{y}_{\mbox{\tiny MR}}^{(i)}$ obtained from T2-weighted turbo spin-echo pulse sequence with 4408 ms repetition time, 100 ms echo time, and 10.8 ms echo spacing [Loizou2011]. The Fourier transform and its inverse are computed via fft2 and ifft2 in the Python package numpy.fft. Finally, for the sampling strategy, we choose the uniform subsampling with factor 4 and 12 additional low frequency sampling among total 256 lines [Hyun2018].

In order to test our proposed method, we decompose dataset using $k$ level framelet decomposition $\mathpzc{W}^{(k)}$ with various filter banks. We obtain

[TABLE]

where both $\mathpzc{W}^{(k)}\mathbf{x}_{\mbox{\tiny MR}}^{(i)}$ and $\mathpzc{W}^{(k)}\mathbf{y}_{\mbox{\tiny MR}}^{(i)}$ contains $r^{k}$ pairs of $256/2^{2k}\times 256/2^{2k}$ image. Here, $k$ is the decomposition level and $r$ is the number of filter $\textbf{q}_{\alpha}$ .

3.2 Experimental Set-up for Sparse-view CT

Let $\{\mathbf{y}^{(i)}_{\mbox{\tiny CT}}\in\mathbb{R}^{512\times 512}\}_{i=1}^{N}$ be a set of CT images reconstructed with the Nyquist sampling. The corresponding deep learning training inputs are computed in the following sense;

[TABLE]

where $\mathscr{R}$ is the discrete Radon transform, $\mathscr{R}^{-1}$ is the filtered-back projection algorithm, and $\mathcal{S}$ is a user-defined sampling operator. In our implementations, we use the projection algorithm radon and filtered back-projection algorithm iradon in the Python package skimage.transform for computing $\mathscr{R}$ and its inverse $\mathscr{R}^{-1}$ respectively. Uniform subsampling with factor $6$ in terms of projection-view is also used for $\mathcal{S}$ in (17).

Applying the same process used to generate a dataset (16) for undersampled MRI experiments, we obtain the following decomposed dataset for sparse-view CT problem;

[TABLE]

where $\mathpzc{W}^{(k)}$ is a $k$ level framelet decomposition.

In our whole experiments, we use a first and second level framelet decomposition ( $k=1,2$ ) with three different framelets (Haar wavelet(Haar), Daubechies 4 tap wavelet(Db4), and piecewise linear B-spline framelet(PL)).

3.3 Network Configuration

To test our proposed method, we adapt the U-net architecture [Ronneberger2015], as shown in Fig. 5, where the first half of network is the contracting path and the last half is the expansive path. At the first layer in U-net in Fig. 5, the input $\mathpzc{W}^{(k)}\mathbf{x}$ is convolved with the set of convolution filters $\textbf{C}^{(1)}$ so that it generates a set of feature maps $\mathbf{h}^{(1)}$ , given by

[TABLE]

where ReLU is the rectified linear unit $\mbox{ReLU}(x)=\max\{x,0\}$ and $\circledast_{1}$ stands for the convolution with stride 1. We repeat this process to get $\mathbf{h}^{(2)}=\mbox{ReLU}(\textbf{C}^{(2)}\circledast_{1}\mathbf{h}^{(1)})$ and apply max pooling to get $\mathbf{h}^{(3)}$ . Through this contracting path, we can obtain low dimensional feature maps by applying either convolution or max pooling. In the expansive path, we use the $2\times 2$ average unpooling instead of max-pooling to restore the size of the output. To restore details in image, the upsampled output is concatenated with the correspondingly feature from the contracting path. At the last layer a 1 $\times$ 1 convolution is used to combine each feature with one integrated feature [Ronneberger2015].

The U-net in the top row of Fig 5 will be denoted by $\mathbb{U}^{(0)}\mbox{\footnotesize-NET}$ . The U-net in the middle row, denoted by $\mathbb{U}^{(1)}\mbox{\footnotesize-NET}$ , is the reduced network by eliminating two $3\times 3$ convolution layers and one pooling/unpooling layer in the first and last part of $\mathbb{U}^{(0)}\mbox{\footnotesize-NET}$ . Similarly, $\mathbb{U}^{(2)}\mbox{\footnotesize-NET}$ is the reduced network by eliminating $3\times 3$ convolution layers and pooling/unpooling layer in the first and last part of $\mathbb{U}^{(1)}\mbox{\footnotesize-NET}$ . Thus, this process can be viewed as the replacement of operations with unknown and trainable paramters into framelet operations with known and fixed paramters. In our experiments, $\mathbb{U}^{(0)}\mbox{\footnotesize-NET}$ is used to learn $f$ in the sense of direct learning (8). The reduced $\mathbb{U}^{(k)}\mbox{\footnotesize-NET}$ ( $k=1,2$ ) is trained with $k$ level framelet decomposed dataset in the sense of (14).

3.4 Experimental Result

All training processes are performed in two Intel(R) Xeon(R) CPU E5-2630 v4, 2.20GHz, 128GB DDR4 RAM, and four NVIDIA GTX-1080ti computer system. We initialize all weights by a normal distribution with zero-centered and 0.01 standard deviation, under the Tensorflow environment [Google]. We use the $\ell^{2}$ loss for the loss function $\mathscr{L}$ . The loss function is minimized using the Adam Optimizer and the batch normalization for fast convergence [Kingma2014, Ioffe2015]. For stability on training, the small learning rate $10^{-6}$ is used. In order to guarantee the convergence of loss function, the network is trained until the training loss seems to converge sufficiently.

Fig. 5 and Fig. 6 show reconstruction results from $\mathbb{U}^{(0)}\mbox{\footnotesize-NET}$ , $\mathbb{U}^{(1)}\mbox{\footnotesize-NET}$ , and $\mathbb{U}^{(2)}\mbox{\footnotesize-NET}$ . Three models show similar reconstruction performances, regardless of their originated problem and their original data dimension. Quantitative evaluations and comparisons for the application on the undersampled MRI problem are summerized in the Table 1 and 2. For the sparse-view CT application, evaluations and comparisons are given in Table 3 and Table 4. Table 1 and Table 3 shows comparisons of average computational time per epoch among $\mathbb{U}^{(0)}\mbox{\footnotesize-NET}$ , $\mathbb{U}^{(1)}\mbox{\footnotesize-NET}$ , and $\mathbb{U}^{(2)}\mbox{\footnotesize-NET}$ . The average computational time is computed by dividing the total computational time by the total number of epoch. Table 2 and Table 4 contains test error evaluations and comparions using three different metrics; mean square error(MSE), peak signal to noise ratio(PSNR), and structure similarity(SSIM) [wang2004].

These experimental results support the fact that the proposed method reduces the total computational time efficiently and provides competitive results compared to the direct learning algorithm using high dimensional images. Namely, our reduced method provides very similar performance to the standard unreduced method ( $\mathbb{U}^{(0)}\mbox{\footnotesize-NET}$ ), while reducing the computation time greatly by reducing the input dimension.

We also test our proposed method with three different framelets and compare performances, as shown in Table 2 and 4 for the quantitative evaluation and Table 1 and 3 for the computational time. Experimental results report that Haar and Db4 Wavelet reduce the computational time more efficiently than PL framelet, but PL framelet exhibits the better performance than Haar and Db4 Wavelet. Compared to Haar and Db4 consisting of $4$ filter banks, PL framelet has $9$ filter banks (i.e. the number of filter banks equals the size of filters), which can increase the computational time. However, it should be noted that Haar and Db4 are orthonormal bases while PL framelet is a redundant tight frame system. This means that, thanks to the redundancy, it is likely that the error generated by the nonlinear deep learning process can lie in the nontrivial null space of the reconstruction operator, which can make the PL framelet yield better results than the orthonormal basis (Haar and Db4) [bin2017]. Lastly, we would like to mention that the computational time increases in the case of $\mathbb{U}^{(2)}\mbox{\footnotesize-NET}$ with PL framelet in the undersampled MRI problem, compared to the original network $\mathbb{U}^{(0)}\mbox{\footnotesize-NET}$ . We can observe that the reduction of computational time depends on the feature depth of network. In order to reduce total computational complexities of experiments as possible, our networks are set to have 16 feature depth, as described in Fig 5. However, when the feature depth increases, $\mathbb{U}^{(2)}\mbox{\footnotesize-NET}$ with PL framelet also exhibits the computational time reduction ability, as shown in the Table 5.

4 Conclusion and Discussion

In this paper, we proposed the framelet pooling aided deep learning network to reduce computational burdens in the training process. The proposed method decomposes large-scale learning tasks into several small-scale learning tasks through the framelet packet transformation so that we can handle large-scale medical imaging in a limited computing environment. Experimental results on undersampled MRI and sparse-view CT reconstruction problems show that our framelet pooling method is at least comparable to the standard deep learning based method, but is able to reduce total computational time in the training process significantly. Hence, we expect that our method is not limited to the two dimensional medical imaging problem. It seems possible that the framelet pooling method can be extended to deep learning problems with large-scale 3 dimensional medical imaging, which inevitably suffers from high computational complexity due to the high dimensionality of dataset.

In the experiments, we can see that the choice of filter banks indeed affects the performance of the proposed method. The use of tight frame can increase the reconstruction accuracy thanks to rich representation under the redundant system, but the computational time reduction ability can be marginal due to the increasing number of convolutions. In contrast, the orthogonal wavelet representation provides high computational time reduction by only using $4$ filters, but generates less accurate results. Hence, the future work will focus on the construction of framelet transformation which is both adaptive to a given task [Gai2014] and computationally efficient. It would also be interesting to provide a theoretical analysis on the approximation property of our deep learning network.

Acknowledgments

Hyun, Kim, Cho and Seo are supported by Samsung Science $\&$ Technology Foundation SSTF-BA1402-01, Hyun is supported by the National Research Foundation of Korea grants NRF-2018H1A2A1062505, Kim was supported by NRF grant NRF-2017R1E1A1A03070653.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] \harvarditem Daubechies 1988 Daubechies 1988 I. Daubechies 1988 Orthonormal bases of compactly supported wavelets Communications on Pure and Applied Mathematics 41 909–996.
2[2] \harvarditem Hornik 1991 hornik 1991 K. Hornik 1991 Approximation capabilities of multilayer feedforward networks Neural Networks 4(2) 251–257.
3[3] \harvarditem Barron 1994 barron 1994 A. R. Barron 1994 Approximation and estimation bounds for artificial neural networks Machine Learning 14(1) 115–133.
4[4] \harvarditem Ron et al. 1997 ron 1997 A. Ron and Z. Shen 1997 Affine System in L 2 ( ℝ d ) superscript 𝐿 2 superscript ℝ 𝑑 L^{2}(\mathbb{R}^{d}) : The analysis of the analysis operator, Journal of Fourier Analysis and Applications , 148 408–447.
5[5] \harvarditem Wang et al. 2004 wang 2004 Z. Wang, A. C. Bovik, H.R. Sheikh, and E.P. Simoncelli 2004 Image quality assessment: from error visibility to structural similarity IEEE Trans. on Image Processing 13 600–612.
6[6] \harvarditem Mallat 2009 mallat 2009 S. Mallat 2009 A wavelet tour of signal processing Elsevier .
7[7] \harvarditem Nishimura 2010 Nishimura 2010 D. G. Nishimura 2010 Principles of magnetic resonance imaging Stanford Univ .
8[8] \harvarditem Loizou et al. 2011 Loizou 2011 C.P. Loizou, V. Murray, M.S. Pattichis, I. Seimenis, M. Pantziaris, and C.S. Pattichis 2011 Multi-scale amplitude modulation-frequency modulation (AM-FM) texture analysis of multiple sclerosis in brain MRI images, IEEE Trans. Inform. Tech. Biomed. 15(1) 119–129.