Mask Mining for Improved Liver Lesion Segmentation

Karsten Roth; J\"urgen Hesser; Tomasz Konopczy\'nski

arXiv:1908.05062·eess.IV·March 13, 2020

Mask Mining for Improved Liver Lesion Segmentation

Karsten Roth, J\"urgen Hesser, Tomasz Konopczy\'nski

PDF

TL;DR

This paper introduces a novel error-aware training method for U-Net models that enhances liver and lesion segmentation accuracy in CT scans by focusing on reducing false positives and improving recall, demonstrated on LiTS data.

Contribution

The proposed method incorporates segmentation errors into the training process, enabling models to learn features that mitigate previous mistakes, which is a novel approach in liver lesion segmentation.

Findings

01

Up to 2-point increase in dice score on LiTS dataset

02

Effective across multiple U-Net architectures

03

Improves recall and reduces false positives

Abstract

We propose a novel procedure to improve liver and lesion segmentation from CT scans for U-Net based models. Our method extends standard segmentation pipelines to focus on higher target recall or reduction of noisy false-positive predictions, boosting overall segmentation performance. To achieve this, we include segmentation errors into a new learning process appended to the main training setup, allowing the model to find features which explain away previous errors. We evaluate this on semantically distinct architectures: cascaded two- and three-dimensional as well as combined learning setups for multitask segmentation. Liver and lesion segmentation data are provided by the Liver Tumor Segmentation challenge (LiTS), with an increase in dice score of up to 2 points.

Tables1

Table 1. Table 1 : Quantitative evaluation of network performance before and after error inclusion ( Inc. ). We show volume-averaged dice scores for liver and lesion segmentation on the test set and fixed training and validation sets. We see a clear improvements in dice scores. In addition, error inclusion reduces seed-dependent variation in performance (measured over three runs).

Setup	Training Dice		Validation Dice		Online Test Dice
Setup	Liver	Lesion	Liver	Lesion	Liver	Lesion
2D	$96.9 \pm 0.3$	$71.9 \pm 0.4$	$95.9 \pm 0.3$	$63.5 \pm 0.6$	$95.3 \pm 0.2$	$62.9 \pm 0.3$
Inc.	$97.0 \pm 0.1$	$73.7 \pm 0.2$	$96.3 \pm 0.2$	$64.9 \pm 0.2$	$95.5 \pm 0.3$	$63.5 \pm 0.2$
3D	$92.2 \pm 1.4$	$63.0 \pm 0.8$	$91.4 \pm 0.9$	$56.8 \pm 2.0$	$91.2 \pm 1.0$	$55.5 \pm 0.9$
Inc.	$94.2 \pm 0.3$	$66.1 \pm 0.4$	$91.8 \pm 0.6$	$57.7 \pm 0.4$	$92.0 \pm 0.4$	$56.5 \pm 0.2$
Cmb	$94.5 \pm 0.3$	$70.1 \pm 0.5$	$92.9 \pm 0.7$	$61.6 \pm 0.5$	$93.4 \pm 0.3$	$61.9 \pm 0.2$
Inc.	$96.2 \pm 0.5$	$72.3 \pm 0.4$	$94.0 \pm 0.3$	$63.4 \pm 0.4$	$94.7 \pm 0.3$	$63.0 \pm 0.1$

Equations2

O_{ij m}^{k} (x^{k}) = ⌊ \frac{argmax _{c \in [0, .., C - 1]} ϕ _{ij m, c}^{m u l t i} ( x ^{k} )}{2} ⌋

O_{ij m}^{k} (x^{k}) = ⌊ \frac{argmax _{c \in [0, .., C - 1]} ϕ _{ij m, c}^{m u l t i} ( x ^{k} )}{2} ⌋

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net

Full text

Mask Mining for Improved Liver Lesion Segmentation

Abstract

We propose a novel procedure to improve liver and lesion segmentation from CT scans for U-Net based models. Our method extends standard segmentation pipelines to focus on higher target recall or reduction of noisy false-positive predictions, boosting overall segmentation performance. To achieve this, we include segmentation errors into a new learning process appended to the main training setup, allowing the model to find features which explain away previous errors. We evaluate this on semantically distinct architectures: cascaded two- and three-dimensional as well as combined learning setups for multitask segmentation. Liver and lesion segmentation data are provided by the Liver Tumor Segmentation challenge (LiTS), with an increase in dice score of up to 2 points.

**Index Terms— ** U-Net, Liver Lesion Segmentation, Medical Imaging, Data Mining

1 Introduction

Liver imaging nowadays is mostly done via Computed Tomography (CT) [1]. Providing fully-automatic segmentation of liver and lesion tissue from CT data can hence be a useful tool to help with diagnosis and treatment planning. Common approaches utilize U-Nets [2], e.g. [3, 1, 4]. However, training of neural networks can be a difficult endeavour. To improve on existing scores, computationally expensive re-runs without guarantee of improvement are often needed.

We thus suggest a novel pipeline to reliably boost network segmentation performances by including segmentation errors as novel training masks in a post-training step.

Prior work on the inclusion of segmentation errors into the training process include [5, 6], who propose a Tversky-coefficient-based loss, which generalizes the standard Dice coefficient loss to include additional hyperparameters for penalizing false-positive or false-negative predictions during training. [7] utilize segmentation error types in a complex adversarial setup, where refinement networks are trained on top of the basic setup to remove these errors. While the former introduce new hyperparameters, the usage of adversarial networks in [7] limits the usable network architectures. In both cases, heavy tuning and reruns are required for different architectural setups, as these methods are linked directly to the main learning process. This holds especially true going to three-dimensional data, which is common for many medical segmentation tasks.

We therefore propose to use segmentation error types in a setup separate to the main training. Using segmentation errors of the learned networks, we append a secondary training process with specific loss functions to provide a framework that helps networks explain away own segmentation errors, thereby boosting segmentation performance (see fig.1 for qualitative impressions). Previous work s.a. [8] has shown the benefit of explaining away undesired properties. This means that our method stays independent of architecture and data choices, and allows for improved performance without costly reruns of the full setup.

Distinctly different networks and datatypes are tested to check the architecture- and datatype-independent applicability. This includes 2D and 3D data utilised in different training styles which are based around 2D and 3D U-Net [2, 9, 10] pipelines (fig. 3). Both training and evaluation is done on the Liver Tumor Segmentation (LiTS) dataset [1], showing consistent improvements in all setups.

2 METHOD

Fundamental for our proposed extension (fig. 2) is the generation of new training masks to alter the current network performance and allow the network to learn from its own errors.

2.1 Basic Setup

A segmentation pipeline of choice is trained until convergence following any training procedure. Now, segmentation masks over the training data are generated through single forward passes with minimal computational burden. These are then compared to the original ground truth to determine new training classes for each pixel, based on segmentation error cases: True Negative, False Positive, False Negative and True Positive. This gives four target classes compared to the binary case with two classes. We then append four single-layer output channels serving as error prediction layers to the output layer, introducing no relevant new parameters, but ensuring that all previously learned weights are kept until retraining on the novel masks is performed. Due to the initial pretraining, convergence occurs much faster.

2.2 Relevance of loss function

Assuming the majority of predicted pixels to be true positive or negative after training, we distinguish two approaches based on the choice of loss:

A pixel-weighted crossentropy loss (pwce) (e.g. [2]) gives highest learning signal to high frequency targets. As we have a high imbalance towards true positive/negative predictions, retraining on error masks primarily reinforces these predictions while dropping noisy false positives. The retrained multiclass error case predictions are then grouped into true positive/false negative and true negative/false positive predictions to generate a final binary segmentation mask:

[TABLE]

A dice-coefficient based loss (e.g. [4]) injects a stronger learning signal for underrepresented classes for higher recovery of false-negative/positive pixels. Here, the primary interest lies in explaining away obfuscating features while retaining crucial ones, so the true positive error mask class is replaced with the ground truth segmentation mask. This allows the network to transfer properties generating false-positives to the respective output channel and recover generators for false-negative predictions. The final segmentation is taken directly from the true positive output channel.

Both loss functions offer a potential boost in performance and are mentioned for completeness. However, for all subsequent results, we utilize a dice-based loss as it provides marginally higher improvements.

3 Application to Liver and Lesion Segmentation

3.1 Network Architectures

We investigate the performance of our method on liver and lesion segmentation by evaluating dice score performance on distinct architectures: Cascaded 2D [11], which trains a 2D segmentation network for liver and lesion segmentation separately, Cascaded 3D, which does the same for a 3D setup and Combined Cascaded 2D [10], which trains separate segmentators for liver and lesion in a simultaneous setup. All networks are equipped with common extensions such as multislice inputs [12], batch normalization [13], residual blocks [14] and squeeze-and-excitation modules [15]. Each pipeline is trained to convergence before applying our extension to ensure that we do not just prolong the training process. For liver segmentation, initial training is done with pwce loss $L^{pwce}$ and distance-transformation weightmaps (see [2]).The lesion segmentation loss is based on dividing $L^{pwce}$ by a smooth dice score $L^{dice}$ (see e.g. [4]), $L_{combined}=L^{pwce}\cdot\left(L^{dice}+\epsilon\right)^{-1}$ , with $\epsilon=10^{-5}$ .

3.2 LiTS dataset

The Liver Tumor Segmentation (LiTS) dataset [1] contains 131 3D lower abdominal CT scans with liver/lesion ground truth masks, as well as 70 test volumes evaluated by online submission. The dataset is publicly (Creative Commons License) and was collected for ISBI/MICCAI 2017. All volumes have horizontal dimensions of 512 with near constant resolution. In the axial direction dimensionality and resolution vary strongly, which is a relevant factor for any approach using higher-than-two dimensional data input. Before training, the data is bounded to $[-100,600]$ $HU$ before performing normalization. For evaluation, only the largest connected component is used to generate the final liver segmentation.

3.3 Implementation Details

The full pipeline is implemented with PyTorch [16]. We use a $85\%|15\%$ train/val split and run everything on a NVIDIA GeForce 1080Ti. $256\times 256$ crops with batchsize $12$ are used for 2D training and $128\times 128\times 64$ crops with batchsize $2$ 3D training. For liver segmentation, crops are taken randomly, while for lesion segmentation crops in and around the liver are used. Standard data augmentation using random horizontal and vertical flips, random rotation and random zooming is performed, all in axial direction. For optimization, Adam [17] with an initial learning rate of $10^{-5}$ and $L2$ -regularisation $\lambda=10^{-5}$ is used. Standard step-based learning rate scheduling is included as well. Training is performed for 70 epochs to ensure convergence, saving the best validation weights.

3.4 Results

We compute the averaged dice score per test volume before and after application of our method for all architectures. Here, relative improvement is the key metric to examine. Results are summarized in tab. 1, showing a consistent gain over the initially trained model, especially for the combined training setup. This is arguably due to the simultaneous boost in liver and lesion segmentation performance. The inclusions of mined trained masks into the training process specifically benefits validation performance. This is rooted in the splitting procedure, as training and validation set are drawn from the same sample set. Due to different sources contributing to the dataset [1], the test set samples therefore differ much stronger from the training set. Newly mined features are hence more expressive on the validation set.

4 CONTROL OF SEGMENTATION ERRORS

We also qualitatively study the usage of our method to control the produced segmentation error types, with examples in fig. 1. To do so, the distribution of segmentation error types before and after running a mask mining step with either a multiclass dice loss or a multiclass pwce loss is compared over all proposed architectures, see fig. 4. A clear shift in false-positive and false-negative pixels depending on the choice of utilized loss can be seen compared to the initial pre-mask-mining setup. The network segmentation behaviour changes for different loss functions, while the segmentation performance in both cases is improved.

5 Conclusion

We introduced a novel extension to standard liver and lesion segmentation pipelines on the basis of the Liver Tumor Segmentation (LiTS) dataset. By helping the network learn and thereby explain away previously made errors using automatically generated training labels, we boost segmentation performance on different and distinct architectures and training styles. Although we present our work on the task of liver and liver lesion segmentation from CT scans via deep learning U-net like architectures, due to the architecture-independent applicability our method can be extend to other medical image segmentation problems. Not only to a variety of other applications but also to other machine learning based semantic segmentation techniques. We see our Mask Mining idea as an addition to boosting and ensemble methods. Decreasing the number of false negatives is of great significance, especially in medical image analysis. Most significantly, the use of our Mask Mining method allows potential detection of previously omitted objects of interest. However, we still observe limitation in terms of the dice score. A straightforward idea for improvement would be an iterative approach in which the Mask Mining could be repetitively used on the learning model. It should help the model to further increase the sensitiveness to the errors performed by the model. We leave this issue for future research and investigation.

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Patrick Bilic et al., “The liver tumor segmentation benchmark (lits),” Co RR , vol. abs/1901.04056, 2019.
2[2] Olaf Ronneberger et al., “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention , 2015, pp. 234–241.
3[3] Fabian Isensee et al., “nnu-net: Self-adapting framework for u-net-based medical image segmentation,” Co RR , vol. abs/1809.10486, 2018.
4[4] Michal Drozdzal et al., “The importance of skip connections in biomedical image segmentation,” Co RR , vol. abs/1608.04117, 2016.
5[5] Seyed Sadegh Mohseni Salehi et al., “Tversky loss function for image segmentation using 3d fully convolutional deep networks,” in Machine Learning in Medical Imaging , 2017, pp. 379–387.
6[6] Karsten Roth, Tomasz K. Konopczynski, and Jürgen Hesser, “Liver lesion segmentation with slice-wise 2d tiramisu and tversky loss function,” Co RR , vol. abs/1905.03639, 2019.
7[7] Mina Rezaei et al., “Conditional generative refinement adversarial networks for unbalanced medical image semantic segmentation,” Co RR , vol. abs/1810.03871, 2018.
8[8] Karsten Roth, Biagio Brattoli, and Bjorn Ommer, “Mic: Mining interclass characteristics for improved metric learning,” in Proceedings of the IEEE International Conference on Computer Vision , 2019, pp. 8000–8009.