$\nu$-net: Deep Learning for Generalized Biventricular Cardiac Mass and   Function Parameters

Hinrich B Winther; Christian Hundt; Bertil Schmidt; Christoph Czerner,; Johann Bauersachs; Frank Wacker; Jens Vogel-Claussen

arXiv:1706.04397·cs.CV·June 15, 2017

$\nu$-net: Deep Learning for Generalized Biventricular Cardiac Mass and Function Parameters

Hinrich B Winther, Christian Hundt, Bertil Schmidt, Christoph Czerner,, Johann Bauersachs, Frank Wacker, Jens Vogel-Claussen

PDF

Open Access

TL;DR

This paper presents $ u$-net, a deep learning model that automates high-quality segmentation of cardiac ventricles in MRI images, enabling reliable extraction of cardiac function parameters across diverse datasets.

Contribution

Introduction of $ u$-net, a deep neural network for fully automated cardiac MRI segmentation with a simple adaptation method for different segmentation styles.

Findings

01

High ICCs for ventricular ejection fraction and mass parameters.

02

State-of-the-art dice coefficient performance.

03

Effective adaptation procedure for different segmentation philosophies.

Abstract

Background: Cardiac MRI derived biventricular mass and function parameters, such as end-systolic volume (ESV), end-diastolic volume (EDV), ejection fraction (EF), stroke volume (SV), and ventricular mass (VM) are clinically well established. Image segmentation can be challenging and time-consuming, due to the complex anatomy of the human heart. Objectives: This study introduces $ν$ -net (/nju:n $ε$ t/) -- a deep learning approach allowing for fully-automated high quality segmentation of right (RV) and left ventricular (LV) endocardium and epicardium for extraction of cardiac function parameters. Methods: A set consisting of 253 manually segmented cases has been used to train a deep neural network. Subsequently, the network has been evaluated on 4 different multicenter data sets with a total of over 1000 cases. Results: For LV EF the intraclass correlation coefficient…

Tables3

Table 1. Table 1: Comparison of the RV and LV (V) segmentation performance of the epi- and endocardium (endo) and ventricular mass (VM) based on the Dice similarity coefficient ( DSC ). All values are denoted as mean ± plus-or-minus \pm std.

Hannover Medical School
	method	V	DSC (epi)	DSC (endo)
	proposed	LV	95 $\pm$ 2 %	92 $\pm$ 4 %
	proposed	RV	90 $\pm$ 4 %	88 $\pm$ 6 %
MICCAI 2009 LV Segmentation Challenge
	proposed (ad-hoc)	LV	93 $\pm$ 3 %	84 $\pm$ 7 %
	proposed (with retraining)	LV	95 $\pm$ 3 %	94 $\pm$ 3 %
	[25]	LV	93 $\pm$ 2 %	88 $\pm$ 3 %
	[5]	LV	n/a	94 $\pm$ 2 %
	[4]	LV	n/a	90 $\pm$ 4 %
	[6]	LV	96 $\pm$ 1 %	92 $\pm$ 3 %
	[12]	LV	94 $\pm$ 2 %	90 $\pm$ 5 %
	[26]	LV	n/a	88 $\pm$ 3 %
	[13]	LV	94 $\pm$ 2 %	89 $\pm$ 3 %
	[14]	LV	94 $\pm$ 2 %	88 $\pm$ 3 %
	[15]	LV	93 $\pm$ 2 %	89 $\pm$ 4 %
Right Ventricular Segmentation Challenge
	proposed (ad-hoc)	RV	86 $\pm$ 6 %	85 $\pm$ 7 %
	[27]	RV	n/a	81 $\pm$ 21 %
	[6]	RV	86 $\pm$ 11 %	84 $\pm$ 21 %
	[28]	RV	80 $\pm$ 22 %	76 $\pm$ 25 %
	[29]	RV	63 $\pm$ 35 %	59 $\pm$ 34 %
	[30]	RV	63 $\pm$ 27 %	58 $\pm$ 31 %

Table 2. Table 2: Comparison of the intraclass correlation coefficient ( ICC )for the clinical parameters EF (%), EDV (ml/m 2 ), ESV (ml/m 2 ), SV (ml/m 2 ), and VM (g/m 2 ) between the ground truth and the fully automatically measured volumes. Human inter-observer ICC (denoted as human) has been determined by Caudron et al. [ 31 ] . Coefficients, surpassing human performance, have been highlighted. LVSC and RVSC are reported as ad-hoc performance.

	MHH		DSBCC	LVSC	RVSC	HUMAN
	RV	LV	LV	LV	RV	RV	LV
EF	0.960	0.983	0.794	0.945	0.867	0.800	0.953
EDV	0.958	0.985	0.935	0.966	0.924	0.892	0.987
ESV	0.920	0.953	0.918	0.962	0.958	0.917	0.992
SV	0.923	0.978	0.898	0.907	0.841	0.814	0.867
VM	0.832	0.948		0.941	0.825	0.54	0.848

Table 3. Table 3: Descriptive statistical analysis of the results based on the Hannover Medical School Data-Set. All values have been computed at the end-systolic and end-diastolic phase. All values are denoted as mean ± plus-or-minus \pm std.

label	accuracy	specificity	precision	recall	dice	overlap
LV endocardial	99.9 $\pm$ 0.0 %	99.9 $\pm$ 0.0 %	91.1 $\pm$ 6.6 %	92.3 $\pm$ 5.1 %	91.5 $\pm$ 4.3 %	84.6 $\pm$ 6.9 %
LV mass	99.8 $\pm$ 0.1 %	99.9 $\pm$ 0.0 %	89.6 $\pm$ 4.3 %	88.3 $\pm$ 4.0 %	88.8 $\pm$ 2.9 %	80.1 $\pm$ 4.6 %
LV epicardial	99.8 $\pm$ 0.1 %	99.9 $\pm$ 0.1 %	94.8 $\pm$ 3.4 %	94.3 $\pm$ 2.7 %	94.5 $\pm$ 2.0 %	89.6 $\pm$ 3.4 %
RV endocardial	99.9 $\pm$ 0.1 %	99.9 $\pm$ 0.0 %	89.4 $\pm$ 6.2 %	86.7 $\pm$ 7.9 %	87.7 $\pm$ 5.5 %	78.5 $\pm$ 8.1 %
RV mass	99.8 $\pm$ 0.1 %	99.9 $\pm$ 0.0 %	80.4 $\pm$ 8.2 %	72.4 $\pm$ 7.1 %	75.8 $\pm$ 5.9 %	61.4 $\pm$ 7.5 %
RV epicardial	99.8 $\pm$ 0.1 %	99.9 $\pm$ 0.0 %	92.9 $\pm$ 4.1 %	87.2 $\pm$ 5.9 %	89.8 $\pm$ 3.8 %	81.8 $\pm$ 6.0 %

Equations11

EF [%] = \frac{V _{e n d o}^{E D} - V _{e n d o}^{E S}}{V _{e n d o}^{E D}} \cdot 100

EF [%] = \frac{V _{e n d o}^{E D} - V _{e n d o}^{E S}}{V _{e n d o}^{E D}} \cdot 100

VM = ρ \cdot (V_{e p i} - V_{e n d o})

VM = ρ \cdot (V_{e p i} - V_{e n d o})

DSC (X, Y) = 2 \frac{∣ X \cap Y ∣}{∣ X ∣ + ∣ Y ∣} .

DSC (X, Y) = 2 \frac{∣ X \cap Y ∣}{∣ X ∣ + ∣ Y ∣} .

(i j) \mapsto (i + f^{(+)} (i, j) j + f^{(-)} (i, j)),

(i j) \mapsto (i + f^{(+)} (i, j) j + f^{(-)} (i, j)),

f^{(\pm)} (i, j) :

f^{(\pm)} (i, j) :

+ a_{ii}^{(\pm)} \cdot i^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCardiac Imaging and Diagnostics · Radiomics and Machine Learning in Medical Imaging · Machine Learning in Materials Science

Full text

$\nu$ -net: Deep Learning for Generalized Biventricular Cardiac Mass and Function Parameters

Abstract

Background: Cardiac MRI derived biventricular mass and function parameters, such as end-systolic volume (ESV), end-diastolic volume (EDV), ejection fraction (EF), stroke volume (SV), and ventricular mass (VM) are clinically well established. Image segmentation can be challenging and time-consuming, due to the complex anatomy of the human heart.

Objectives: This study introduces $\nu$ -net (/nju:n $\varepsilon$ t/) – a deep learning approach allowing for fully-automated high quality segmentation of right (RV) and left ventricular (LV) endocardium and epicardium for extraction of cardiac function parameters.

Methods: A set consisting of 253 manually segmented cases has been used to train a deep neural network. Subsequently, the network has been evaluated on 4 different multicenter data sets with a total of over 1000 cases.

Results: For LV EF the intraclass correlation coefficient (ICC) is 98, 95, and 80 % (95 %), and for RV EF 96, and 87 % (80 %) on the respective data sets (human expert ICCs reported in parenthesis). The LV VM ICC is 95, and 94 % (84 %), and the RV VM ICC is 83, and 83 % (54 %). This study proposes a simple adjustment procedure, allowing for the adaptation to distinct segmentation philosophies. $\nu$ -net exhibits state of-the-art performance in terms of dice coefficient.

Conclusions: Biventricular mass and function parameters can be determined reliably in high quality by applying a deep neural network for cardiac MRI segmentation, especially in the anatomically complex right ventricle. Adaption to individual segmentation styles by applying a simple adjustment procedure is viable, allowing for the processing of novel data without time-consuming additional training.

keywords:

cardiac image segmentation , deep learning , biventricular clinical parameters

††journal: arXiv

\auth

[mhhrad,breath]Hinrich B Winther \auth[jguinf]Christian Hundt \auth[jguinf]Bertil Schmidt \auth[mhhrad,breath]Christoph Czerner \auth[mhhcar]Johann Bauersachs \auth[mhhrad,breath]Frank Wacker \auth[mhhrad,breath]Jens Vogel-Claussen

1 Introduction

The World Health Organization identifies ischaemic heart diseases as the leading cause of death [1]. Imaging technologies, such as magnetic resonance imaging (MRI), yield clinically well established parameters, including end-systolic volume (ESV), end-diastolic volume (EDV), ejection fraction (EF) and stroke volume (SV) as well as ventricular mass (VM). In order to determine these clinical biventricular cardiac mass and function parameters, usually a skilled physician with expertise in cardiac MRI has to segment the image data. This task is typically performed in a time span of approximately 15-20 minutes per case.

Automated image segmentation, especially of the 2D short axis cardiac cine MRI stacks is a highly competitive research field. Active contour models [2, 3], and machine learning approaches [4, 5, 6] are among the most successful methods.

One major challenge for the design of robust classifiers for automated cardiac image segmentation is the lack of manually annotated training data (ground truth). Hence, models with a high number of free parameters, such as deep neural networks, tend to overfit to the characteristics of the assembled data. Image segmentation quality for similar image morphologies is typically sufficient, however, the quality might rapidly degrade for differing image characteristics. This can be caused by varying image acquisition techniques, varying experimental protocols and image morphology altering illnesses, such as cardiomyopathies. Additionally, image segmentation philosophies have been shown to have major influence on the resulting biventricular mass and function parameters. The inclusion or exclusion of trabeculations and papillary muscles affect the left and right ventricular mass as well as the endocardial volume [7, 8, 9, 10]. No convention has been universally accepted for analyzing trabeculation and papillary muscle mass [11]. Moreover, the amount of realizable medical images is orders-of-magnitude bigger than the number of assembled samples. Many studies limit themselves to a single data source at a time for training and validation [5, 12, 13, 14, 15]. This might impair the generalization potential of corresponding models.

A recent paper by Tran et al. [6] applied transfer learning to adapt a pre-trained model to novel data sets. Unfortunately, a major drawback of this approach is the time-consuming retraining of the neural network.

Our study investigates the generalization potential of cardiac image segmentation. In detail, we have composed a diverse data set consisting of images with highly varying characteristics and further applied non-linear augmentation techniques to artificially increase the number of training samples by orders-of-magnitude. We further demonstrate how to determine fully automated high quality estimates of clinical parameters, such as end-systolic volume (ESV), end-diastolic volume (EDV), ejection fraction (EF), stroke volume (SV), and ventricular mass (VM).

2 Materials and Methods

2.1 Evaluation Measures

In this study we evaluate the performance of the proposed cardiac segmentation approach by determining the clinical gold standard parameters EF, ESV, EDV as well as VM of the left and right ventricle. Furthermore, we compare the automatically computed image segmentation with the ground truth in terms of similarity measures, such as overlap, and Dice similarity coefficient (DSC).

2.1.1 Evaluation of Biventricular Cardiac Mass and Function

The performance of the computed segmentation is determined by calculating EF, ESV and EDV. In order to determine the EF in a 2D short axis cine MRI stack, the ESV and EDV are typically measured by segmenting the corresponding volumes as described by [8] (Simpson’s method). The EF is the fraction of the EDV, which is ejected with every cardiac cycle, i.e. the normalized difference of the corresponding volumes at end-systole and end-diastole:

[TABLE]

where $V^{ES}_{endo}$ denotes the ESV and $V^{ED}_{endo}$ refers to the EDV.

The ventricular mass (VM)is calculated as product of a constant conversion density factor $\rho$ and the volume of the right ( $V^{RV}_{epi}$ ) or the left ( $V^{LV}_{epi}$ ) ventricle, respectively:

[TABLE]

where $\rho$ is a phenomenologically determined conversion factor of 1.05 g/cm3. Note that the VM is usually determined at the end-diastole.

The clinical parameters are gathered for the manual ground truth and the automatically computed prediction by counting the corresponding voxels. Volumes and other derived quantities are compared in terms of Spearman’s rank correlation coefficient, the root mean square distance (RMSD), mean absolute percentage error (MAPE), mean error (ME) as well as intraclass correlation coefficient (ICC).

2.1.2 Evaluation of Technical Parameters

In this study we present the expected performance metrics in order to evaluate the quality of the automated cardiac segmentation. These metrics include geometrical measures quantifying overlap based on the DSC and traditional Machine Learning metrics, such as accuracy, precision, recall, and specificity.

The DSC is proportional to the ratio of intersection between two volumes divided by the sum of said volumes:

[TABLE]

All performance measures are carried out on calculated 3D spatial volumes.

2.2 Data Sets

The experiments were performed on four independent 2D short axis cine cardiac MRI data sets as depicted in Figure 1:

2.2.1 Hannover Medical School (MHH) Data Set

The data set consists of 193 training and 309 validation 2D steady state free precession (ssfp) short axis cine MRI stacks. The end-systolic and end-diastolic contours have been created by a senior radiologist (15 year experience in cardiac MRI) and are accepted as ground truth for this study. Image acquisition was performed on three different 1.5 T MRI scanners of a single vendor (Siemens Healthineers, Erlangen, Germany) using a ssfp sequence with a slice thickness of 8 mm, no gap, an acquisition matrix of 256 $\times$ 208 pixels with an in-plane spatial resolution of 1.4 $\times$ 1.4 mm2.

2.2.2 Data Science Bowl Cardiac Challenge (DSBCC) Data

The data set is composed of 500 training, 440 testing, and 200 validation 2D short axis cine stacks. It has been compiled by the National Institutes of Health (NIH) and the Children’s National Medical Center (CNMC). It is at least one order-of-magnitude more extensive than any data set previously released. Each stack contains approximately 30 images over the cardiac cycle. The data set does not include a ground truth segmentation, instead, the end-systolic and end-diastolic volumes of the left-ventricle (LV) are provided. This data set has been subject of the Second Annual Data Science Bowl contest.

Our study design does not require the fine-grained separation of the data into a test, training, and validation set as provided. Therefore, the validation and test set have been merged in order to extend the validation set to 640 cases. The 2D short axis cine MRI stacks have been converted into 4D matrices, stored in the Neuroimaging Informatics Technology Initiative (NIFTI) format. This process failed for 14 of the training and 38 of the validation cases due to inconsistent image dimensions, resulting in 486 training and 602 validation NIFTI files.

The training set has been used twice. First, 60 cases with high visual diversity have been manually segmented and subsequently included in the training process of the neural network. Second, all 486 training cases have been utilized to fit a linear regression for the adjustment of the clinical parameters.

2.2.3 MICCAI 2009 LV Segmentation Challenge (LVSC) Data Set

The MICCAI 2009 LV Segmentation Challenge [16] data set has been published by the Sunnybrook Health Sciences Center (Canada). This data set has been utilized in two separate experiments:

In the first experiment, this data set is used exclusively for validation purposes in this study, i.e. none of the images is used for training the neural network. This decision has been made in order to explore the generalization potential of the network and is being referred to as the ad-hoc performance. The data set consists of 45 cases. Ground truth segmentation is available for the end-diastole and the end-systole. It is composed of 12 heart failure with infarction, 12 heart failure without infarction, 12 LV hypertrophy patients and 9 healthy subjects. The data was split into three parts: 15 for training, 15 for testing, and 15 for validation in an on-line contest. Data conversion has failed for one case (SC-HYP-37). As mentioned before, no training images of this data set have been used for this study. Therefore, all available images have been assembled into a single validation set consisting of 44 cases with end-systolic and end-diastolic segmentation, yielding a total of 88 2D image stacks.

In the second experiment, the data set has been split into 29 training and 15 validation images. A network, pretrained on the regime as depicted in Figure 1, has been retrained on this specific split, in order to evaluate the performance in contrast to the ad hoc results, and referred to as results with retraining.

Image acquisition was performed on a 1.5 T GE Signa MRI scanner from the atrioventricular ring to the apex in 6 to 12 2D short axis cine stacks with a slice thickness of 8 mm, a gap of 8 mm, a field of view of 320 $\times$ 320 mm2 with a matrix resolution of 256 $\times$ 256.

This data set is relevant since it has been extensively used in prior studies for training and evaluation. The end-systolic and end-diastolic contours are provided for the endo- and epicardial volume.

2.2.4 Right Ventricular Segmentation Challenge (RVSC) Data Set

The RVSC was held at the MICCAI 2012 conference. For this challenge a data set consisting of 16 training and 32 validation 2D short axis cine MRI stacks were acquired. Contour data is only provided for the training set, the validation contours are withheld by the authors in order to ensure an independent validation.

For this study we utilize the training set for validation, as contour information is only available for the training set. However, it has to be stressed, that no image or segmentation data of the training set has been used in the training process of the neural network. All 16 training cases have been segmented at the end-systolic and end-diastolic phase, yielding a total of 32 2D image stacks. Throughout the rest of this paper we refer to the actual training set as the validation set of this study.

Image acquisition has been performed on a 1.5 T scanner (Symphony Tim, Siemens Medical Systems, Erlangen, Germany). Retrospectively gated balanced ssfp cine MRI sequences were performed for analysis with repeated breath-holds of 10-15 s. A total of 8-12 2D short axis cine planes were acquired from the base to the apex of the ventricles. The temporal resolution of the cine images is 20 images per cardiac cycle. All images have been zoomed and cropped to a 256 $\times$ 216 or 216 $\times$ 256 resolution, leaving the LV visible.

2.3 Network Topology

The topology of $\nu$ -net (/nju:n $\varepsilon$ t/), the neural network of this study, is depicted in Figure 2. It is derived from the U-Net architecture [17] and has been implemented in TensorFlow [18]. The input layer has been resized to 256 $\times$ 256 neurons with a consecutive downsampling of the subsequent layers using a 3 $\times$ 3 convolution with a 2 $\times$ 2 striding in contrast to a max-pooling operation of the original U-Net architecture. The padding has been changed from valid to same, resulting in an output layer of the same size as the input layer. All activation functions have been changed from a rectified linear unit (ReLU) to a parametric rectified linear unit (PReLU) as proposed by He et al. [19].

2.4 Network Training

The neural network was trained by minimizing binary cross-entropy as objective function. Backpropagation was used to compute the gradients of the cross-entropy loss. The model was initialized with random values sampled from a uniform distribution without scaling variance (uniform scaling) as proposed by Glorot and Bengio [20]. Adaptive Moment Estimation (Adam) [21] was chosen as stochastic optimization method. The initial learning rate of 10*-3* was gradually reduced down to 10*-6* during training.

The training set is assembled from 3,519 2D images (2,894 Hannover Medical School and 625 Data Science Bowl Cardiac Challenge images). Data augmentation has been applied to artificially inflate the training set as described in Section 2.5. One complete training run of $\nu$ -net takes about 24 to 36 hours.

2.5 Data Augmentation

Data augmentation is often used to artificially inflate the training data set [22, 23, 24]. This technique generates similar images and their corresponding segmentations from already existing data by applying local spatial transformations to them. The key idea behind this approach is that a slightly deformed heart should be identified by the neural network in a similar manner. Due to the image augmentation in the training phase, the algorithm provides good segmentation results regardless of the orientation, scale and parity of the input image. Translational equivariance is inherited from the convolutional network architecture, i.e. a transformed heart is mapped pixel-wise on the corresponding transformed segmentation in the spatial domain.

In order to fully utilize the computational resources of modern workstations equipped with multicore central processing units (CPUs) and Compute Unified Device Architecture (CUDA)-enabled graphics processing units (GPUs), we have developed an auxiliary library that facilitates parallel image augmentation on the CPU while training of the neural network is delegated to the GPU. The time needed for image augmentation can completely be hidden since concurrent augmentation of a batch of input images can be accomplished faster than the actual training step. Hence, augmentation and training are performed efficiently in an interleaved manner: the next batch of images is augmented on the CPU while the current batch is still being trained on the GPU.

The library is implemented in the C++ programming language and the Open Multiprocessing (OpenMP) extension. It features convenient bindings for the Python programming language which seamlessly interact with the TensorFlow framework. Moreover, we can apply highly non-linear, and computationally expensive local deformations to the input data as well as the ground truth segmentations, due to the aforementioned efficient interleaving. Besides traditional global transformations from the affine group $\mathop{\mathrm{Aff}}(\mathbb{R},n)=\mathop{\mathrm{GL}}(\mathbb{R},n)\ltimes\mathbb{R}^{n}$ such as scaling, shearing, rotation, mirroring, and translations, we allow for the pixel-wise deformation of the spatial pixel domain

[TABLE]

where $f^{(\pm)}(i,j)$ are second degree multivariate polynomials in the pixel coordinates $i$ and $j$

[TABLE]

and the coefficients $a^{(\pm)}$ are sampled from a uniform distribution over the closed interval $[-\epsilon,+\epsilon]$ . The hyper-parameter $\epsilon\geq 0$ controls the amount of deformation. The special case $\epsilon=0$ refers to no deformation. Fractional indices are mapped via bilinear interpolation. Figure 3 shows an exemplary augmentation of an MRI.

3 Results

This study explores the possibility of creating a general purpose cardiac image segmentation model, capable of reliably producing high quality segmentations, independent of aspects such as different image acquisition techniques, and diverse MRI protocols. For this purpose, the model was trained on a proprietary data set (MHH) as well as a small subsample of the DSBCC training set. The goal was to learn the specific concept of cardiac segmentation from the highly standardized MHH data set as well as abstracting a more general notion for different image morphologies from the heterogeneous DSBCC data set and, in turn, prevent overfitting on the characteristics of a specific data set, as depicted in Figure 4. The resulting intraclass correlation coefficients for all evaluated data sets are listed in Table 2.

The agreement of the predicted segmentations with the ground truth is high for the MHH data set with a Spearman’s rho of 0.98 for an overall of $4\cdot 309=1,236$ segmented volumes of the left and right endocardium as well as VM as depicted in Figure 5. This is supported by a DSC of about 90 % as depicted in Table 1 as well as an high ICC of 92-99 % for the ESV and EDV of both ventricles. A lower DSC of 78 % is obtained for the right VM. Further results, such as DSC, overlap, and accuracy are listed in Table 3. A selection of images of the MHH, LVSC, and RVSC data sets is illustrated in Figure 6.

The predicted segmentation can be used to directly compute the clinical parameters on MHH data by applying the Simpson’s method [8]. The same approach cannot be performed on MRIs stemming from different sources, because of different segmentation philosophies, resulting in differing clinical measurements for the same case. To compensate for this variation, a linear regression has been performed to adapt the predicted ESV and EDV of the DSBCC data set. This fit has been determined using standard linear regression, mapping the predicted volumes of the training set onto the ground truth scalar volumes, whilst omitting the vertical intercept. Neither the model in training, nor the linear regressor were fitted to any of the validation cases in order to eliminate the possibility of leakage. As depicted in Figure 7 the agreement with the ground truth is high with a Spearman’s rho of 0.96, an ICC of 92/94 % (ESV/EDV), and a MAPE of 14 %.

The performance of the proposed method would have ranked under the best 20 competitors of the Second Annual Data Science Bowl with a CRP score of 0.0154. Figure 8 depicts the weakest segmentation, based on the CRP score, for additional illustration.

The correlation of the left and right epi- and endocardial volumes of the LVSC and RVSC data sets is high with a Spearman’s rho of 0.95 as depicted in Figures 11 and 12 as well as ICC values of 96/97 % (ESV/EDV) for the left and 92/96 % for the right ventricle. A lower rho of 0.83 is expected for the right epicardium. DSC for a selection of relevant studies are presented in Table 1. RMSD, ME and MAPE were calculated after adjusting the resulting volumes with a linear regression, fitted on the validation-set. This was necessary, as the validation sets are too small to perform a reasonable split. A random sample of images and the corrosponding segmentation from the LVSC and RVSC data sets is illustrated in Figures 9 and 10.

4 Discussion

There are many philosophies on how to perform image segmentation of cardiac MRIs. These philosophies have major influence on the resulting biventricular mass and function parameters. For example, the inclusion or exclusion of trabeculations and papillary muscles affect the left and right ventricular mass as well as the endocardial volume [7, 8, 9]. No convention has been universally accepted for analyzing trabeculation and papillary muscle mass [11].

Typically, these conventions are enforced on an institute basis. This poses a major challenge for the automated image segmentation with deep neural networks, because of the resulting difference of the clinical measurements. In order to learn the specific segmentation characteristics of a site, a specifically tailored model would have to be trained. This renders the whole process impractical.

However, most of the time, image segmentation is only means to the end of determining the clinical parameters by measuring the ventricular volumes. These parameters include the ventricular mass (VM), end-systolic volume (ESV)as well as end-diastolic volume (EDV), which yield the ejection fraction (EF)and stroke volume (SV). These established clinical parameters are of great importance in early detection of cardiac illnesses as well as treatment monitoring.

The hypothesis of this study is that the clinical parameters can be reliably measured by adapting a pre-trained neural network to a new environment and applying one of the most basic statistical models, a linear regression. This result is unexpected since differences in segmentation guidelines are usually of local nature and do not necessarily need to exhibit a linear dependency on the final measurement. In order to substantiate this assumption, $\nu$ -net was trained on the extensive MHH data set. $\nu$ -net demonstrates state-of-the-art performance, as depicted in the results Figure 5. Furthermore, $\nu$ -net was trained on a small, hand-picked subset of the DSBCC training set. This was done with the idea of transfer learning in mind, in order to convey an understanding of different image acquisition methods, and varying image morphologies to the neural network.

$\nu$ -net was benchmarked against the DSBCC validation set. As depicted in Figure 7, it was possible to adapt the results of the classifier for the specific data set by employing a linear regression, fitted on the training set.

The performance of the proposed method would have ranked under the best 20 competitors of the Second Annual Data Science Bowl. A strong correlation between predicted and ground truth volumes was observed with a Spearman’s rho of 0.965 and a MAPE of 14.4 %. Nevertheless, the final predicted volumes of the proposed solution could be fine-tuned using high-level machine learning regressors as performed by the winning solution. This study, however, explicitly omits sophisticated post-processing and use of meta data (such as gender, age, height, and scanner geometry) in order to avoid over-fitting to a specific experimental setting. This could reasonably impair the neural network’s generalization potential.

The hypothesis is further substantiated by the results of the LVSC and RVSC data sets. As depicted in Figures 11 and 12 the segmentation results demonstrate a strong correlation with the manual segmentation exhibiting a Spearman’s rho of approximately 0.95. Furthermore, the corresponding ICCs imply human level performance in determining clinical parameters. $\nu$ -net surpasses human level in predicting parameters for the right ventricle. Note that volume prediction was performed without training on a single image of these data sets.

$\nu$ -net demonstrates a state-of-the-art ad-hoc segmentation performance in terms of DSC for the epi- and endocardium of the RVSC and the epicardium of the LVSC data set compared to [6] and [27] as well as other studies [5, 25, 4, 12, 13, 14, 15, 28, 29, 30]. This is remarkable, as $\nu$ -net was not trained on any images of the aforementioned data sets. The slightly weaker ad-hoc DSC of 84 % for the endocardium of the LVSC could reflect different segmentation philosophies compared to the original MHH data set, resulting in a systematic error as shown in Figure 11. This hypothesis is supported by corresponding high ad-hoc ICC values for the ESV, EDV, EF, and SV. Furthermore, additional retraining results in state-of-the-art performance for the endocardial DSC.

Regarding the MHH data set, $\nu$ -net achieves comparable or higher agreement with the ground truth than two human observers agree on average in measuring biventricular mass and function parameters [31]. Furthermore, $\nu$ -net accomplishes comparable to human performance on the LVSC and RVSC data sets. $\nu$ -net outperforms a human by a wide margin especially at the task of gauging the right ventricular endocardial volume and ventricular mass. A slightly lower ICC score of the left endocardial volumes on the DSBCC data set is most likely due to a multi-center and multi-observer setting, resulting in a inherent heterogeneity in the data set. In order to improve the performance, the results would have to be evaluated for each observer independently.

One limitation of this study is the small size of openly available data sets. The LVSC and RVSC contain 61 cases with freely accessible contours. Furthermore, the aforementioned data sets include the segmentation of the LV or RV exclusively. Additionally, training and validating on a single center data set bears the risk of overfitting as demonstrated in Figure 4. Therefore, a large, fully labeled, multi-center, multi-reader data set would be advantageous. This data set could be used to train and evaluate future models.

5 Conclusion

This study demonstrates the reliability of automatically determining clinical cardiac parameters such as ESV, EDV, EF, SV and VM on 4 data sets. Especially in the RV the neuronal network outperformed the human cardiac expert in the presented study, which likely enables more reliable RV mass and function measurements for improved clinical treatment monitoring in the future. Furthermore, it is demonstrated that the aforementioned parameters, resulting of an image segmentation by a pre-trained neural network, can be adjusted by performing a linear regression. This effectively eliminates the associated costs of introducing a neural network for determining clinical cardiac parameters in a new setting.

5.1 Perspectives

COMPETENCY IN MEDICAL KNOWLEDGE: Deep learning can reliably determine high quality fully-automated cardiac segmentation for precise determination of clinically well established biventricular mass and function parameters in a multi center setting.

TRANSLATIONAL OUTLOOK: The presented neuronal network is ready to be used in large scale, multi center, multi reader data sets for cost- and time efficient analysis of cardiac mass and function parameters.

6 Acknowledgments

The authors thank all parties involved in the data acquisition process, especially Frank Schröder and Lars Kähler.

Appendix A Figures and Tables

Appendix B Bibliography

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] World Health Organization “WHO | The top 10 causes of death”, 2015 URL: http://who.int/mediacentre/factsheets/fs 310/en/
2[2] Michael Kass, Andrew Witkin and Demetri Terzopoulos “Snakes: Active contour models” In International Journal of Computer Vision 1.4 , 1988, pp. 321–331 DOI: 10.1007/BF 00133570 · doi ↗
3[3] Stanley Osher and James A Sethian “Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations” In Journal of Computational Physics 79.1 , 1988, pp. 12–49 DOI: 10.1016/0021-9991(88)90002-2 · doi ↗
4[4] Rudra P. K. Poudel, Pablo Lamata and Giovanni Montana “Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation” In Reconstruction, Segmentation, and Analysis of Medical Images Springer, Cham, 2016, pp. 83–94 DOI: 10.1007/978-3-319-52280-7“˙8 · doi ↗
5[5] M. R. Avendi, Arash Kheradvar and Hamid Jafarkhani “A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI” In Medical Image Analysis 30 , 2016, pp. 108–119 DOI: 10.1016/j.media.2016.01.005 · doi ↗
6[6] Phi Vu Tran “A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI” In ar Xiv:1604.00494 [cs] , 2016 ar Xiv: http://arxiv.org/abs/1604.00494
7[7] Burkhard Sievers et al. “Impact of papillary muscles in ventricular volume and ejection fraction assessment by cardiovascular magnetic resonance” In Journal of Cardiovascular Magnetic Resonance: Official Journal of the Society for Cardiovascular Magnetic Reson ance 6.1 , 2004, pp. 9–16
8[8] Michiel M. Winter et al. “Evaluating the systemic right ventricle by CMR: the importance of consistent and reproducible delineation of th e cavity” In Journal of Cardiovascular Magnetic Resonance: Official Journal of the Society for Cardiovascular Magnetic Resonance 10 , 2008, pp. 40 DOI: 10.1186/1532-429X-10-40 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

ν\nuν-net: Deep Learning for Generalized Biventricular Cardiac Mass and Function Parameters

Abstract

keywords:

1 Introduction

2 Materials and Methods

2.1 Evaluation Measures

2.1.1 Evaluation of Biventricular Cardiac Mass and Function

2.1.2 Evaluation of Technical Parameters

2.2 Data Sets

2.2.1 Hannover Medical School (MHH) Data Set

2.2.2 Data Science Bowl Cardiac Challenge (DSBCC) Data

2.2.3 MICCAI 2009 LV Segmentation Challenge (LVSC) Data Set

2.2.4 Right Ventricular Segmentation Challenge (RVSC) Data Set

2.3 Network Topology

2.4 Network Training

2.5 Data Augmentation

3 Results

4 Discussion

5 Conclusion

5.1 Perspectives

6 Acknowledgments

Appendix A Figures and Tables

Appendix B Bibliography

$\nu$ -net: Deep Learning for Generalized Biventricular Cardiac Mass and Function Parameters