Distributed deep learning for robust multi-site segmentation of CT   imaging after traumatic brain injury

Samuel Remedios; Snehashis Roy; Justin Blaber; Camilo Bermudez,; Vishwesh Nath; Mayur B. Patel; John A. Butman; Bennett A. Landman; Dzung L.; Pham

arXiv:1903.04207·cs.CV·March 12, 2019

Distributed deep learning for robust multi-site segmentation of CT imaging after traumatic brain injury

Samuel Remedios, Snehashis Roy, Justin Blaber, Camilo Bermudez,, Vishwesh Nath, Mayur B. Patel, John A. Butman, Bennett A. Landman, Dzung L., Pham

PDF

Open Access

TL;DR

This study demonstrates that multi-site training of neural networks on disparate CT datasets improves brain hematoma segmentation accuracy without sharing patient data, enhancing model generalization and performance.

Contribution

The paper introduces a multi-site learning approach for CT segmentation that preserves patient privacy while improving model accuracy over single-site models.

Findings

01

Multi-site model achieved Dice score of 0.64.

02

Correlation of automated and manual hematoma volumes was 0.87.

03

Multi-site training improved performance by 8% and 5%.

Abstract

Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available on which to train models. To address this conundrum, we analyze the efficacy of transferring the model itself in lieu of data between different sites. By doing so we accomplish two goals: 1) the model gains access to training on a larger dataset that it could not normally obtain and 2) the model better generalizes, having trained on data from separate locations. In this paper, we implement multi-site learning with disparate datasets from the National Institutes of Health (NIH) and Vanderbilt…

Tables2

Table 1. Table 1 : Distribution of CT image volumes between training and test sets for both sites.

Training Location	# Training Images	# Testing Images
VUMC	$10$	$8$
NIH	$17$	$10$
Total	$𝟐𝟕$	$𝟏𝟖$

Table 2. Table 2 : Average Dice coefficients and Pearson correlation coefficients for the three training strategies over the NIH and VUMC datasets. The average result over both datasets is shown to illustrate each model’s general ability. An asterisk indicates significant improvements in Dice coefficient ( p < 0.05 𝑝 0.05 p<0.05 ) between the MSL and each of the NIH SSL and VUMC SSL models as evaluated by the Wilcoxon signed-rank test, and bold text indicates the highest Pearson correlation coefficient between automatic and manual segmented hematoma volumes.

	NIH Data		VUMC Data		Average of NIH and VUMC data
	Dice	Correlation	Dice	Correlation	Dice	Correlation
Inter-Rater	$0.687$	n/a	n/a	n/a	n/a	n/a
NIH SSL	$0.512$	$0.913$	$0.690$	$0.752$	$0.601$	$0.832$
VUMC SSL	$0.407$	$0.859$	$0.745$	$0.754$	$0.576$	$0.807$
MSL	${0.552}^{*}$	0.943	$0.725$	0.791	${0.63}^{*}$	0.867

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Machine Learning in Healthcare · Intracerebral and Subarachnoid Hemorrhage Research

Full text

Distributed deep learning for robust multi-site segmentation of CT imaging after traumatic brain injury

Samuel Remedios

Center for Neuroscience and Regenerative Medicine, Henry Jackson Foundation

Radiology and Imaging Sciences, Clinical Center, National Institute of Health

Department of Computer Science, Middle Tennessee State University

Department of Electrical Engineering, Vanderbilt University

Snehashis Roy

Center for Neuroscience and Regenerative Medicine, Henry Jackson Foundation

Radiology and Imaging Sciences, Clinical Center, National Institute of Health

Justin Blaber

Department of Electrical Engineering, Vanderbilt University

Camilo Bermudez

Department of Biomedical Engineering, Vanderbilt University

Vishwesh Nath

Department of Computer Science, Vanderbilt University

Mayur B. Patel

Departments of Surgery, Neurosurgery, Hearing & Speech Sciences; Center for Health Services Research, Vanderbilt Brain Institute; Critical Illness, Brain Dysfunction, and Survivorship Center, Vanderbilt University Medical Center; VA Tennessee Valley Healthcare System, Department of Veterans Affairs Medical Center

John A. Butman

Radiology and Imaging Sciences, Clinical Center, National Institute of Health

Bennett A. Landman

Department of Electrical Engineering, Vanderbilt University

Department of Biomedical Engineering, Vanderbilt University

Department of Computer Science, Vanderbilt University

Dzung L. Pham

Center for Neuroscience and Regenerative Medicine, Henry Jackson Foundation

Radiology and Imaging Sciences, Clinical Center, National Institute of Health

Abstract

Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available on which to train models. To address this conundrum, we analyze the efficacy of transferring the model itself in lieu of data between different sites. By doing so we accomplish two goals: $1$ ) the model gains access to training on a larger dataset that it could not normally obtain and $2$ ) the model better generalizes, having trained on data from separate locations. In this paper, we implement multi-site learning with disparate datasets from the National Institutes of Health (NIH) and Vanderbilt University Medical Center (VUMC) without compromising PHI. Three neural networks are trained to convergence on a computed tomography (CT) brain hematoma segmentation task: one only with NIH data, one only with VUMC data, and one multi-site model alternating between NIH and VUMC data. Resultant lesion masks with the multi-site model attain an average Dice similarity coefficient of $0.64$ and the automatically segmented hematoma volumes correlate to those done manually with a Pearson correlation coefficient of $0.87$ , corresponding to an $8$ % and $5$ % improvement, respectively, over the single-site model counterparts.

keywords:

multi-site, distributed, deep learning, neural network, computed tomography (CT), hematoma, lesion, segmentation

1 Introduction

Deep learning has recently become a key approach for computer vision and medical imaging problems. Neural networks have been used to skull-strip CT scans[1], segment magnetic resonance images[2], locate and segment blood vessels[3], as well as segment brain regions[4] and lesions[5]. A wide variety of models and architectures have been implemented to solve these tasks, and there also exist pre-trained models prepared for general use cases[6]. Regardless of the particular task for which a model is designed or selected, machine learning methods generally benefit from the inclusion of more data for training and validating the model[7]. Traditionally, acquisition of multi-site data involves data transfer to a centralized location on which the desired model trains; however, it is frequently prohibited or difficult to acquire HIPAA-compliant health data transfer permits[8]. These data restrictions are vital, though, as protected health information (PHI) policies enforce respect for patient privacy and anonymity.[9],[10],[11] Herein lies a contradiction: machine learning models benefit greatly from a wealth of data, yet datasets related to healthcare cannot be shared between sites easily.

To address this problem, we propose to transfer the models themselves between sites in lieu of a dataset transfer. The concept of distributed learning is not new to machine learning, with one such example coming from Google’s implementation of Federated Learning, through which models are averaged between mobile phones[12]. However, this approach does not have the goal of gaining accuracy or generalizability, and instead is a decentralized framework geared towards mobile devices and their limited computing power. Another distributed learning technique is transfer learning[13], which aims to apply useful features learned from one task towards a kick-started learning for some other task. Different still is the concept of asynchronous stochastic gradient descent[14], wherein a model is copied for some number of splits of training data, and their learned weights are aggregated once training is complete.

Recently, a study has embarked to investigate whether a model can perform better if it accesses data from different sites [15], wherein the authors simulate a multi-site scenario by splitting an open-source dataset into groups and apply different transformations and noise to each group with the goal of making the data appear different. The authors investigate applying different multi-site training approaches, comparing transfer learning to different patterns of passing partially trained models.

In this paper, we expand upon this work by using empirical multi-site data, separately acquired from the NIH and VUMC. Because of differences in the acquisition at each site, as well as in delineation protocols, improved performance due to the combined training data gained by multi-site learning is not guaranteed. Thus, we employ the aforementioned paper’s cyclic weight transfer as our training paradigm and forgo the uni-directional transfer learning approach.

Our specific contributions are the presentation of an extensible framework through which multiple sites can train the same model using private data and the validation of the efficacy of two different training schema on the segmentation of hematoma in traumatic brain injury (TBI) CT scans. In the latter contribution, we consider single-site learning at each of the two sites (NIH and VUMC), and multi-site learning between both sites.

Here, we target segmentation of hemorrhages and hematomas in patients with TBI (see Figure 1). Hemorrhages refer to active bleeding, while a hematoma is any collection or swelling of clotted blood outside of the blood vessels, the cause of which could be severe trauma or disease. The identification and segmentation of blood is an important consideration for diagnosis, prediction of patient recovery, and for examining correlations with long-term neurologic disabilities[16] such as cognitive impairment[17]. Improving the efficacy of hematoma segmentation will therefore assist developments in understanding and treating TBI.

2 Method

2.1 Data

CT images from $27$ acute TBI patients presenting with intracranial hematomas were acquired as part of a research study by the Center for Neuroscience and Regenerative Medicine (CNRM) and NIH. At VUMC, $18$ CT images of TBI patients were obtained in de-identified form. The resolutions of all scans from both sites were approximately $0.5\times 0.5\times 5.0$ mm3. All scans were converted from DICOM to NIFTI and subsequently transformed into Hounsfield units. For training, $10$ scans were used at the VUMC site while $17$ were used at the NIH; the remaining $8$ and $10$ , respectively, were set aside as the test dataset. Images from both the NIH and VUMC had a variety of hematoma types, sizes, and locations; however VUMC on average had a larger hematoma volume of $41,000$ mm3 compared with $13,700$ mm3 in the NIH dataset. For preprocessing, all CT image volumes underwent skull-stripping by CT_BET [18] and were rigidly transformed to a common orientation. To address the low number of training images, we collected $1,000$ $255\times 255$ 2D patches from each CT volume, $20\%$ of which were used as a validation set for hyperparameter tuning. Since voxel intensities were in Hounsfield units, no normalization was applied and thus no intensities were scaled. Additionally, because the images have low through-plane resolution ( $5.0$ mm) compared to the in-plane resolution ( $0.5$ mm), only 2D segmentations were considered. Manual segmentations were performed by independent raters at the two sites and reviewed independently by a neuroradiologist; quantities are reflected in Table 1.

2.2 Model Architecture

Previously, an Inception Net-based architecture has performed well on hematoma segmentation from magnetic resonance images[20]; as such, we utilize a similar 2D architecture with arbitrary-sized inputs, permitting 2D patch-wise training and full slice automatic segmentation. This architecture is illustrated in Figure 2. Training continued to convergence, defined as no loss improvement of $1\times 10^{-4}$ in $10$ epochs on the validation patch set. The learning rate was set at $1\times 10^{-4}$ with the Adam[21] optimizer and the continuous Dice coefficient[22](cDC) as the loss. Resultant binary segmentation masks were generated by thresholding the probability masks at $0.5$ .

2.3 Framework Implementation

To implement multi-site learning using cyclic weight transfer, we established a server which both the NIH and VUMC could securely access. On this server we mounted a single directory where the neural network weights were kept. Identical Python scripts at both institutions allowed the model to be loaded, trained, and saved via secure shell access to this tertiary server without opening up public connections to either institution’s data [23].

Particularly, in our implementation, data at each site is never accessible to investigators outside that institution.

2.4 Training Strategies

Single-Site Learning As a baseline, each of the sites NIH and VUMC performed single-site learning (SSL) to convergence with their respective datasets. Once converged, each of the NIH SSL and VUMC SSL models were evaluated on the NIH test and VUMC test sets. Concretely, NIH SSL was trained on the NIH training dataset and tested on the NIH and VUMC testing datasets and VUMC SSL was trained on the VUMC train dataset and tested on both the NIH and VUMC testing datasets.

Multi-Site Learning Multi-site learning (MSL) involved training the same model architecture from initialization (i.e.: no transfer learning), then passing the model to the next institution for the subsequent epoch. Thus, MSL would train for one epoch on the NIH train dataset, then one epoch on VUMC train dataset, then one epoch on NIH train dataset, and so on until convergence. As with the NIH SSL and VUMC SSL models, the MSL model was evaluated over both the NIH and VUMC testing datasets.

3 Results

After training, we have three distinct sets of weights for our model: NIH SSL, VUMC SSL, and MSL. Each of these was evaluated over both the NIH and VUMC testing datasets. We validated all weight sets with two quantitative metrics: the Dice coefficient and hematoma volume correlation between the automatic and manual segmentations. Further explanation of these measurements follows.

3.1 Qualitative Evaluation

The automatic segmentations of test CT slices in Fig. 3 allow for qualitative comparisons between the different training sites. As expected, the model trained at its respective location shows fewer false positives than the model trained at the other location. However, in these scenarios we see the MSL model generally contains less predicted hematoma voxels. Yellow arrows indicate false positives not only near the blood-brain barrier, but also ones that are not present in the MSL segmentations.

3.2 Quantitative Evaluation

Separate from the cDC loss function, the traditional Dice coefficient was employed to judge the accuracy of the automatic masks. Figure 3 displays example segmentation results from four different patients while Table 2 shows the overall averages for all models.

To compare the efficacy of the MSL model against the two SSL models, we used the Wilcoxon signed-rank test over the corresponding Dice scores. Our findings, illustrated in Figures 4 and 5, show significant improvement between the MSL model and both the NIH SSL (p=0.009) and VUMC SSL (p=0.005) models with respect to the NIH test dataset, and a significant improvement between the MSL model and the NIH SSL (p=0.01) over the VUMC test dataset. The VUMC SSL model outperformed the MSL model on the VUMC test data, but not significantly (p=0.337).

Two considerations are made regarding low Dice scores. First, specifically regarding the disparity of average Dice scores between the NIH and VUMC visible in Figures 4 and 5, data from the NIH had a lower average hematoma volume than VUMC data ( $13,700$ mm3 for NIH data versus $41,000$ mm3 for VUMC data), and Dice coefficients between two segmentations are known to be dependent on the volumes of the objects being considered. Second, regarding overall average Dice scores for both institutions, some 2D image slices near the top and bottom of the brain as well as along the blood-brain barrier suffer from increased false positives. These are shown in Figure 3, marked by yellow arrows.

As an alternate means to evaluate the accuracy of the automatic segmentation, we calculated the Pearson correlation coefficient between the total volume in the segmentation and manual masks, as provided in Table 2. Although the automatic segmentations contain some small false positives which reduce the Dice coefficient, overall, the volume correlations remain high.

4 Discussion

To our best knowledge, this is the first application of multi-site distributed learning applied to clinical imaging data from different institutions. In this paper, we have presented and validated a technique to distributively train a convolutional neural network over disparate data housed at different institutions. While the multi-site model outperformed its single-site counterparts, our main contribution is a general framework to allow a neural model to train over more data than it would normally have access to while still preserving PHI. We show that for this task, multi-site learning did not detract from the network’s ability to learn over tasks, and as expected, performance improved with more data availability. Additionally, our implementation to transfer the weights between sites automatically is straightforward, publicly available and can be generally applied to other epoch-based training scenarios. Future work includes exploring alternate neural architectures such as U-net and evaluating the generalizablility of the MSL model compared with the SSL models using more than two sites.

5 ACKNOWLEDGEMENTS

Support for this work included funding from the Intramural Research Program of the NIH Clinical Center and the Department of Defense in the Center for Neuroscience and Regenerative Medicine, and NIH grants 1R01EB017230-01A1 (Landman) and 1R01GM120484-01A1 (Patel), as well as NSF 1452485 (Landman). The VUMC dataset was obtained from ImageVU, a research resource supported by the VICTR CTSA award (ULTR000445 from NCATS/NIH), Vanderbilt University Medical Center institutional funding and Patient-Centered Outcomes Research Institute (PCORI; contract CDRN-1306-04869). This work received support from the Advanced Computing Center for Research and Education (ACCRE) at the Vanderbilt University, Nashville, TN, as well as in part by ViSE/VICTR VR3029. We also extend gratitude to NVIDIA for their support by means of the NVIDIA hardware grant.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Akkus, Z., Kostandy, P. M., Philbrick, K. A., and Erickson, B. J., “Extraction of brain tissue from CT head images using fully convolutional neural networks,” in [ Medical Imaging 2018: Image Processing ], 10574 , 1057420, International Society for Optics and Photonics (2018).
2[2] Pereira, S., Pinto, A., Alves, V., and Silva, C. A., “Brain tumor segmentation using convolutional neural networks in MRI images,” IEEE transactions on medical imaging 35 (5), 1240–1251 (2016).
3[3] Liskowski, P. and Krawiec, K., “Segmenting retinal blood vessels with deep neural networks,” IEEE transactions on medical imaging 35 (11), 2369–2380 (2016).
4[4] de Brebisson, A. and Montana, G., “Deep neural networks for anatomical brain segmentation,” in [ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops ], 20–28 (2015).
5[5] Kamnitsas, K., Chen, L., Ledig, C., Rueckert, D., and Glocker, B., “Multi-scale 3d convolutional neural networks for lesion segmentation in brain MRI,” Ischemic stroke lesion segmentation 13 , 46 (2015).
6[6] Lin, M., Chen, Q., and Yan, S., “Network in network,” ar Xiv preprint ar Xiv:1312.4400 (2013).
7[7] Halevy, A., Norvig, P., and Pereira, F., “The unreasonable effectiveness of data,” IEEE Intelligent Systems 24 (2), 8–12 (2009).
8[8] “Nih data sharing policy and implementation guidance.” https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm (2003). Accessed: 2018-07-27.