TL;DR
This paper introduces an unsupervised CNN method that fuses MRI-PET images using SSIM as a loss function, enhancing visualization and quantitative assessment of fused medical images for better diagnosis.
Contribution
A novel end-to-end unsupervised CNN approach utilizing SSIM for effective multimodal brain image fusion and visualization.
Findings
Improved visual perception of fused images.
Favorable quantitative assessment compared to previous methods.
Effective visualization of input contributions.
Abstract
Multimodal medical image fusion helps in combining contrasting features from two or more input imaging modalities to represent fused information in a single image. One of the pivotal clinical applications of medical image fusion is the merging of anatomical and functional modalities for fast diagnosis of malignant tissues. In this paper, we present a novel end-to-end unsupervised learning-based Convolutional Neural Network (CNN) for fusing the high and low frequency components of MRI-PET grayscale image pairs, publicly available at ADNI, by exploiting Structural Similarity Index (SSIM) as the loss function during training. We then apply color coding for the visualization of the fused image by quantifying the contribution of each input image in terms of the partial derivatives of the fused image. We find that our fusion and visualization approach results in better visual perception of…
| Metrics | GF | NSCT-PCDC | LP-SR | NSCT-RPCNN | NSST-PAPCNN | LP-CNN | Proposed |
|---|---|---|---|---|---|---|---|
| 0.8169 | 0.8080 | 0.8092 | 0.8132 | 0.8102 | 0.8076 | 0.8104 | |
| 0.7555 | 0.5457 | 0.6501 | 0.6702 | 0.6685 | 0.5665 | 0.5707 | |
| 0.9224 | 0.8754 | 0.8969 | 0.8941 | 0.8997 | 0.8958 | 0.8885 | |
| 0.8260 | 0.7992 | 0.7837 | 0.8492 | 0.8318 | 0.7176 | 0.8610 | |
| 0.2776 | 0.3415 | 0.5990 | 0.5430 | 0.6001 | 0.5326 | 0.6005 | |
| Time (s) | 13.43 | 221.07 | 75.69 | 775.31 | 521.36 | 481.73 | 0.37 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Computer Graphics and Visualisation, Technische Universitt Dresden,
01062 Dresden, Germany
11email: [email protected] 22institutetext: Institute of Radiation Physics, Helmholtz-Zentrum Dresden-Rossendorf,
01328 Dresden, Germany 33institutetext: Clinical Sensoring and Monitoring, Technische Universitt Dresden,
01307 Dresden, Germany 44institutetext: Department of Neurosurgery, University Hospital Carl Gustav Carus,
01307 Dresden, Germany
Structural Similarity based Anatomical and Functional Brain Imaging Fusion
Nishant Kumar(🖂) 11
Nico Hoffmann 22
Martin Oelschlgel 33
Edmund Koch 33
Matthias Kirsch 44
Stefan Gumhold 11
Abstract
Multimodal medical image fusion helps in combining contrasting features from two or more input imaging modalities to represent fused information in a single image. One of the pivotal clinical applications of medical image fusion is the merging of anatomical and functional modalities for fast diagnosis of malign tissues. In this paper, we present a novel end-to-end unsupervised learning based Convolutional neural network (CNN) for fusing the high and low frequency components of MRI-PET grayscale image pairs publicly available at ADNI by exploiting Structural Similarity Index (SSIM) as the loss function during training. We then apply color coding for the visualization of the fused image by quantifying the contribution of each input image in terms of the partial derivatives of the fused image. We find that our fusion and visualization approach results in better visual perception of the fused image, while also comparing favorably to previous methods when applying various quantitative assessment metrics.
Keywords:
Medical Image Fusion MRI-PET Convolutional Neural Networks (CNN) Structural Similarity Index (SSIM).
1 Introduction
A rapid advancement in sensor technology has improved medical prognosis, surgical navigation and treatment. For example, anatomical modalities such as Magnetic resonance imaging (MRI) and Computed Tomography (CT) reveals the structural information of the brain like the location of tumor as well as white and gray matter while modalities such as Positron emission tomography (PET) provides functional information like glucose metabolism. The hybrid blend of PET-CT acquisition hardware provides fast and accurate attenuation correction and helps in combining anatomical and functional information. However it exposes patients to high level of X-Ray and ionizing radiation. The integrated MRI-PET scanners results in high tissue contrast with significantly low radiation dose. But the development of a robust hybrid MRI-PET hardware is challenging due to compatibility issue of PET detectors in a high magnetic field environment of MRI. The post-hoc fusion of MRI-PET image pairs overcomes the challenges of fully integrated MRI-PET scanners and helps medical personnel to better diagnose brain abnormalities such as glioma and Alzheimer’s disease [1, 2].
Most of the past image fusion methods proposed a three step approach to the fusion problem. First, the source images were transformed into a particular domain using approaches such as multi-scale decomposition [3, 4, 5, 6, 7], sparse representation [8, 9], mixture of multi-scale decomposition and sparse representation [10] and Intensity-Hue-Saturation [11] among others. Then, the transformed coefficients are combined using a predefined coefficient grouping based fusion strategy such as max selection and weighted-averaging. Finally, the fused image is reconstructed by taking the inverse of the transformation strategy. However, the intricacy of these methods leads to the computational inefficiency making them unrealistic for the real time setup [12]. CNN based medical image fusion [13] has been actively studied in the past. However, these methods train the network on natural images due to the unavailability of large preregistered medical image pairs. The acquisition method of natural images differ from PET images since PET accumulates nuclear tracers depending on positron range, photon collinearity or the width of the detector element that results in a smooth low resolution acquisition without clear interfaces between certain tissues. The high resolution MRI such as T1-MPRAGE on the other hand are acquired in spatial frequency domain by varying the sequence of RF pulses. Hence, the aspects of human visual system that are tuned to process natural images are not equally useful for MRI-PET images due to which the selection of a proper objective assessment metric is challenging [14]. Secondly, there are no ground truth in a fusion problem due to which proper selection of the loss layer becomes critical.
Therefore, we propose a fast grayscale anatomical and functional medical image fusion approach in an end-to-end unsupervised learning network trained on publicly available medical image pairs. Additionally, the fusion result is visualized based on the contribution of the input images to the fused output image. The computational efficiency of our combined fusion and visualisation framework has the potential of real time clinical application in future.
2 Methods
2.1 Fusion framework
The fusion architecture in Fig.1. takes two grayscale input images and and generates a grayscale fused image . The network consist of three different strategies named feature extraction, fusion and reconstruction to preserve most of the details from the input modalities. We train the parameters of the feature extraction and reconstruction layers by maximising the structural similarity and minimising the euclidean distance between fused image and the input images.
2.1.1 Feature Extraction:
In the first feature extraction layer, we perform two different convolution operations on each of the input images to decompose it into high and low frequency feature maps. Since blurry PET images has higher low frequency components than sharp MRI images, we define a kernel filter of size for the anatomical input to capture low frequency (LF) features in a larger window while we select a smaller kernel size of to capture the LF features of the functional input efficiently. For the high frequency (HF) layer, we define a kernel size of for anatomical input to capture the sharp local features such as edges and corners better in smaller neighborhoods while we choose a kernel size of for functional input due to less number of edges. We add two more hidden HF layers with increasing number of channels to preserve the deep high frequency features at the boundary regions.
2.1.2 Fusion and Reconstruction:
HF features contain detailed information about texture and edges that has direct impact on the edge distortion of the fused image. Therefore, proper selection of the fusion strategy of HF features is crucial for robust fusion results. Max pooling strategy extracts edges from the features maps whereas average pooling is efficient in preserving textures. We utilise the advantage of each of the methods and propose max-average pooling as fusion rule for the HF features. We implemented weighted averaging strategy as the fusion rule for LF features containing global information of inputs. Our reconstruction strategy contains three hidden layers and we define activation function at the last layer due to its steeper gradients than a sigmoid function making backpropagation effective. Let and be the high frequency features of and at channel in the third hidden HF layer, and the low frequency features of and at channel in the first hidden LF layer and the feature map generated from the second reconstruction layer, then the outputs of first fusion layer and the second fusion layer are:
[TABLE]
2.1.3 Loss function:
The fused image in medical domain is normally evaluated by a human observer whose sensitivity to noise depends on local luminance, contrast and structural properties of the image. Therefore, we adopt the structural similarity index ( [15]) as the human perceptive loss function defined as:
[TABLE]
where and are the two input images and is the number of local windows in the image. In our paper, ===1 gives equal importance to luminance , structural and contrast comparisons of the image contents and at local window with , , as constants given as:
[TABLE]
where , are the mean and , are the standard deviations of image contents and computed using a Gaussian filter with standard deviation and being the correlation coefficient. By empirically setting only SSIM as the loss function, we observed a shift in brightness of the fused image since the smaller preserves edges and contrast better than the luminance in the flat areas of the image. Therefore, in addition to SSIM we employ pixel level loss which preserves luminance better. With and as the two source images and as the final fused image, we express our steerable total loss function as:
[TABLE]
where and while controls the weightage of each of the sub-losses.
2.2 Visualization framework
We visualised the functional and anatomical information in the fused grayscale image by first calculating the partial derivative of each pixel of the fused image with respect to each of the input images. Assuming and as the dimensions of the anatomical input and functional input while and are the dimensions of the fused image , so the gradients and will be:
[TABLE]
We then color coded the functional gradient image and performed Hue Saturation Value (HSV) transformation on both the images. The Hue and Saturation channels of and the Value channel from were stacked and inverse transformed to get the fused colored image. The factor is multiplied with the saturation channel of to prevent the occlusion of anatomical information.
3 Experiments and results
3.1 Training
3.1.1 Data acquisition:
We obtained 500 MRI-PET image pairs publicly available at the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [16] with subject’s age between 55-90 years among both genders. All images were analyzed as axial slices with a voxel size of 1.0 x 1.0 x 1.0 . The MRI images were skull stripped T1 weighted N3m MPRAGE sequences while PET-FDG images were co-registered, averaged, standardized voxel sized with uniform resolution of the same subject. We aligned the MRI-PET image pairs using the Affine transformation tool of 3D Slicer registration library.
3.1.2 Initialisation of hyperparameters:
The kernel filters of our fusion network are initialised as truncated normal distributions with standard deviation of 0.01 while the bias is zero. The stride in each layer is 1 with no padding during convolution since every down-sampling layer will erase detailed information in the input images which is crucial for medical image fusion. We employ batch normalization and Leaky ReLU activation with slope 0.2 to avoid the issue of vanishing gradient. The network is trained for 200 epochs with the batch size of 1 and varied [0,1] on a single GeForce GTX 1080 Ti GPU. The Adam optimizer is used as the optimization function during backpropagation step with learning rate of 0.002. Our approach has been implemented in Python 2.7 and Tensorflow 1.10.1 on a Linux Ubuntu 17.10 x86_64 system with 12 Intel Core i7-8700K CPU @ 3.70 GHz and 64-GB RAM.
3.1.3 Loss curve analysis:
Fig.3. shows the loss curves and for the training data at different values of . The figures convey rapid convergence for all values other than where plays more important role than in the total loss function . It is to be noted that has higher sensitivity to smaller errors such as luminance variations in flat texture-less regions while is more sensitive to larger errors irrespective of the underlying regions within the image. This property leads to delayed convergence of for visually perceptive results at edges as well as flat regions of the fused image.
3.2 Testing
We performed cross-validation on our trained model with a disjoint test dataset that contain 100 MRI-PET image pairs of 100 unique subjects from ADNI and Harvard Whole Brain Atlas [17] databases. 90 MR-T1 and PET-FDG image pairs obtained from ADNI were mutually exclusive from training image pairs. In order to test our method on datasets distinct from ADNI, the remaining 10 pre-registered image pairs were a combination of MR-T1 and PET-FDG or MR-T2 and PET-FDG images obtained from Harvard Whole Brain Atlas [17] with subjects suffering from either Glioma or Alzeihmer’s disease.
3.3 Evaluation settings
The visualisation results of the test images were evaluated with 10 values of on four objective assessment metrics namely nonlinear correlation information entropy () [18], xydeas metric () [19], feature mutual information () [20], structural similarity metric () [15] and human perceptive visual information fidelity () [21] with higher values means better performances. The evaluation resulted in highest scores with = 0.8 and = 0.6 for three of the mentioned metrics. We then used six different medical image fusion methods from recent past namely guided filtering (GF) [7], nonsubsampled contourlet transform (NSCT-PCDC) [3] and (NSCT-RPCNN) [22], combination of multi-scale transform and sparse representation (LP-SR) [10], nonsubsampled shearlet transform (NSST-PAPCNN) [6] and convolutional neural networks (LP-CNN) [13] for quantitative comparisons in a MATLAB R2018a environment. Our code is publicly available at: https://github.com/nish03/FunFuseAn/.
3.4 Comparison to the state of the art
3.4.1 Visual results:
The first set of Fig.2. conveys negligible contribution of PET features in the fused image by GF while NSCT-PCDC, NSST-PAPCNN, LP-SR and NSCT-RPCNN has uneven distribution of structural edges and contrast leading to splotchy visual artifacts. The results from LP-CNN are better than other methods but like other methods it fails to preserve the edges from functional modality i.e. PET. Our method conserve structural information better in both of the image pairs and is robust in preserving the edges (see PET features in red box). The second set of Fig.2. reveals that the luminance of the proposed fusion results increases with greater values leading to brightness artifacts at corner cases of = 0 and = 1. The third set of Fig.2. shows proposed visualisation results at controlled by parameter where a shift in occlusion of the anatomical information with different values of could be observed.
3.4.2 Objective assessment:
Table 1. summarizes the average scores of 100 test image pairs computed for different fusion methods along with our proposed method at = 0.8 and = 0.6. A method with a higher score performs better than a method with a lower score which is applicable for all the mentioned metrics. The results convey that our method performs better with the quality metric and . This is assertive from the fact that the neural network optimizes the loss function and subsequently improves the structural information in the fused image. Overall, the competitive scores reflects the robustness of our method for human perceptive fusion results.
3.4.3 Computational Efficiency:
We evaluated the total runtime of each of the methods for 100 test images in the MATLAB R2018a environment. Table 1. conveys that our fusion and visualisation method achieved best timings since the network parameters are optimized during the training phase and with a fixed batch size it requires just one forward propagation through the fusion network to generate the fused images. Therefore, our fusion network could also be utilized in a real time neurosurgical intervention setup where a continuous feed of live images in a form of time series will generate fused output video stream with very low time delay.
4 Conclusion and Discussion
We presented a novel image fusion and visualisation framework which is highly suitable for diagnosing malignant brain conditions. The end-to-end learning based fusion model utilised the structural similarity loss to construct artifact free fusion images and the gradient based visualisation delineated the anatomical features of MRI from the functional features of PET in the fused image. The extensive evaluation of our approach conveyed significant improvements in human perceptive results compared to past methods. In future, our method could further be extended to include other combination of anatomical and functional imaging modalities by changing the fusion architecture especially the feature extraction layers. Additionally, we plan to immersively visualise the proposed results in an augmented reality based real time preoperative setup, thereby enabling medical experts to make robust clinical decisions.
4.0.1 Acknowledgements.
This work was supported by the European Social Fund (project no. 100312752) and the Saxonian Ministry of Science and Art.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] James, A.P., Dasarathy, B.V.: Medical Image Fusion: A Survey of the State of the Art. Information Fusion. 19 , 4–19 (2014)
- 2[2] Nensa, F., Beiderwellen, K., Heusch, P., Wetter, A.: Clinical applications of PET/MRI: Current status and future perspectives. Diagnostic and interventional radiology. 20 (5), 438–447 (2014)
- 3[3] Bhatnagar, G., Wu, Q.M.J., Liu, Z.: Directive Contrast Based Multimodal Medical Image Fusion in NSCT Domain. IEEE Transactions on Multimedia. 15 (5), 1014–1024 (2013)
- 4[4] Du, J., Li, W., Xiao, B., Nawaz, Q.: Union Laplacian pyramid with multiple features for medical image fusion. Neurocomputing. 194 , 326–339 (2016)
- 5[5] Du, J., Li, W., Xiao, B.: Anatomical-Functional Image Fusion by Information of Interest in Local Laplacian Filtering Domain. IEEE Transactions on Image Processing. 26 (12), 5855–5866 (2017)
- 6[6] Yin, M., Liu, X., Liu, Y., Chen, X.: Medical Image Fusion With Parameter-Adaptive Pulse Coupled Neural Network in Nonsubsampled Shearlet Transform Domain. IEEE Transactions on Instrumentation and Measurement. 68 (1), 49–64 (2019)
- 7[7] Li, S., Kang, X., Hu, J.: Image Fusion With Guided Filtering. IEEE Transactions on Image Processing. 22 (7), 2864–2875 (2013)
- 8[8] Li, H., He, X., Tao, D., Tang, Y., Wang, R.: Joint medical image fusion, denoising and enhancement via discriminative low-rank sparse dictionaries learning. Pattern Recognition. 79 , 130–146 (2018)
