Prostate segmentation using Z-net

Yue Zhang; Jiong Wu; Wanli Chen; Yifan Chen; Xiaoying Tang

arXiv:1901.06115·eess.IV·January 21, 2019·ISBI

Prostate segmentation using Z-net

Yue Zhang, Jiong Wu, Wanli Chen, Yifan Chen, Xiaoying Tang

PDF

Open Access

TL;DR

This paper introduces Z-net, a novel CNN architecture inspired by U-net, designed for prostate segmentation in MRI images, demonstrating superior performance over classical CNNs through extensive evaluation.

Contribution

The paper presents Z-net, a new CNN architecture with multi-level feature capturing, and compares three sample size normalization methods, establishing 2D resize as most effective.

Findings

01

Z-net outperforms classical CNNs in prostate segmentation.

02

2D resize is the most suitable sample normalization method.

03

Z-net effectively captures multi-level features for improved segmentation.

Abstract

In this paper, we proposed a novel architecture of convolutional neural network (CNN), namely Z-net, for segmenting prostate from magnetic resonance images (MRIs). In the proposed Z-net, 5 pairs of Z-block and decoder Z-block with different sizes and numbers of feature maps were assembled in a way similar to that of U-net. The proposed architecture can capture more multi-level features by using concatenation and dense connection. A total of 45 training images were used to train the proposed Z-net and the evaluations were conducted qualitatively on 5 validation images and quantitatively on 30 testing images. In addition, three approaches including pad and cut, 2D resize, and 3D resize for uniforming the size of samples were evaluated and compared. The experimental results demonstrated that the 2D resize is the most suitable approach for the proposed Z-net. Compared to the other two…

Tables3

Table 1. Table 1 : Characteristics of the 5 validation data.

Image index	Voxel size [mm³]	Image size
05	$2.20 \times 0.27 \times 0.27$	$42 \times 512 \times 512$
15	$3.60 \times 0.63 \times 0.63$	$20 \times 320 \times 320$
25	$4.00 \times 0.75 \times 0.75$	$18 \times 256 \times 256$
35	$3.30 \times 0.70 \times 0.70$	$23 \times 256 \times 256$
45	$3.60 \times 0.63 \times 0.63$	$24 \times 320 \times 320$

Table 2. Table 2 : Quantitative comparisons of different uniform methods.

Uniform methods	Simulation	Z-net segmentation
	mean vDSC [ $%$ ]	mean vDSC [ $%$ ]
Pad and cut	100.00	85.14
2D resize	98.43	87.21
3D resize	91.90	83.79

Table 3. Table 3 : Quantitative comparisons between the proposed method and another two methods. ↓ ↓ \downarrow means the a lower value is better.

Methods	vDSC [ $%$ ]	$↓$ HD [mm]	$↓$ RAVD [ $%$ ]
U-net	85.26 $\pm$ 11	8.79 $\pm$ 10	12.65 $\pm$ 21
Ensemble DCNN	87.84 $\pm$ 4	7.24 $\pm$ 5	9.06 $\pm$ 10
Z-net	90.49 $\pm$ 3	4.41 $\pm$ 2	6.88 $\pm$ 8

Equations6

W^{*} = ar g min_{W} \frac{1}{N} n = 1 \sum N L (Z (W, X_{n}), Y_{n}),

W^{*} = ar g min_{W} \frac{1}{N} n = 1 \sum N L (Z (W, X_{n}), Y_{n}),

L (Z (W, X_{n}), Y_{n}) = 1 - \frac{2 \cdot \sum _{j = 1}^{M} ( z _{n}^{j} \cdot y _{n}^{j} ) + s}{\sum _{j = 1}^{M} z _{n}^{j} + \sum _{j = 1}^{M} y _{n}^{j} + s} .

L (Z (W, X_{n}), Y_{n}) = 1 - \frac{2 \cdot \sum _{j = 1}^{M} ( z _{n}^{j} \cdot y _{n}^{j} ) + s}{\sum _{j = 1}^{M} z _{n}^{j} + \sum _{j = 1}^{M} y _{n}^{j} + s} .

Y_{p r e d} = Z (W^{*}, X_{p r e d}),

Y_{p r e d} = Z (W^{*}, X_{p r e d}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Medical Image Segmentation Techniques

Full text

PROSTATE SEGMENTATION USING Z-NET

Abstract

In this paper, we proposed a novel architecture of convolutional neural network (CNN), namely Z-net, for segmenting prostate from magnetic resonance images (MRIs). In the proposed Z-net, 5 pairs of Z-block and decoder Z-block with different sizes and numbers of feature maps were assembled in a way similar to that of U-net. The proposed architecture can capture more multi-level features by using concatenation and dense connection. A total of 45 training images were used to train the proposed Z-net and the evaluations were conducted qualitatively on 5 validation images and quantitatively on 30 testing images. In addition, three approaches including pad and cut, 2D resize, and 3D resize for uniforming the size of samples were evaluated and compared. The experimental results demonstrated that the 2D resize is the most suitable approach for the proposed Z-net. Compared to the other two classical CNN architectures, the proposed method was observed with superior performance for segmenting prostate.

**Index Terms— ** Prostate segmentation, PROMISE 12 Challenge, convolutional neural networks, MRI, Z-net.

1 Introduction

Prostate cancer is one of the most common types of cancer and its mortality rate is the second highest [1]. Fortunately, its mortality rate can be decreased with an early and timely diagnosis. Prostate volume aids in the diagnosis of benign prostatic hyperplasia and plays a key role in clinical decision making [2]. Recently, with the advent of magnetic resonance imaging (MRI) techniques, high spatial resolution and soft-tissue contrast of MR images make them suitable for prostate segmentation and volume calculation [3]. Manual delineation of the prostate is tedious and time-consuming, and is prone to inter- and intra-variability. As such, techniques that can automatically and accurately segment the prostate from MR images is urgently needed for research and clinical purposes.

Previous automated prostate segmentation methods mainly include contour based segmentation and region based segmentation. Contour based methods use prostate boundary information to segment the prostate [4]. Region based method, mainly including graph based method [5] and multi-atlas based methods [6], use local intensity or statistics like mean and standard deviation in an energy minimization framework to achieve segmentation. However, these kinds of method are prone to registration errors and slow in speed of segmentation.

In the last few years, machine learning based method, especially convolutional neural network (CNN), have been proposed. For example, Yu et al. proposed a volumetric CNN with mixed residual connection for prostate segmentation from 3D MR images [7]. Zhu et al. proposed a deeply supervised CNN by passing features extracted from layers at an early stage [8]. Jia et al. proposed a coarse-to-fine segmentation scheme that successfully combined atlas-based coarse segmentation and an ensemble deep CNN based fine segmentation [9]. These prostate segmentation methods are mainly based on U-net [10] which expands features through convolution. However, a potential limitation of U-net is that information loss may exist during the convolution process.

In this paper, we proposed a novel CNN architecture, named Z-net, consisting of 5 pairs of Z-block and decoder Z-block of different sizes and features number which are assembled in a way similar to that of U-net. The Z-block is capable of capturing more features in a multi-level fashion by using concatenation and dense connection. The decoder Z-block can recover more accurate location information in a similar way compared with U-net. In this work, we also investigated and compared different image unifying methods. The proposed Z-net was compared against several other state-of-the-art CNN architectures. All of our experiments were conducted on the MICCAI PROMISE 12 Challenge dataset [11].

2 Method

2.1 Network Architecture

The typically-used CNN architecture for medical image segmentation is U-net [10], which consists of a contracting path and an expanding path. Despite its popularity, U-net has one potential limitation. As shown in Fig. 1, U-net doubles the feature maps directly via convolution from U3 to U4. However, there may be information loss during convolution, being incapable of generating more feature maps. To solve this problem, we designed a Z-block that consistsed of three $3\times 3$ convolutional layers (each followed by a batch normalization (BN) layer and a rectified linear unit (ReLU) layer) and a $2\times 2$ max pooling layer with stride 2 for down sampling. As shown in Fig. 1, the features maps (Z2) inputted to the of max pooling layer were cropped and concatenated with the features maps (Z4) outputted from a max pooling and convolution operation. As such, the number of feature maps gets doubled by fusion in a Z-block. In a symmetric way, we design a decoder Z-block. Such a network architecture is named Z-net as shown in Fig. 2.

2.2 Dataset and Pre-processing

All data used in this study came from the MICCAI PROMISE 12 Challenge [11]. The training dataset consists of 50 transversal T2-weighted images (T2-WIs) of the prostate and the associated segmentation ground truth. The testing dataset consisted of 30 images and the corresponding segmentation ground truth was exclusive to the organizer for independent evaluation. All images were acquired at different hospitals, using different scanners and showed marked variations in terms of dynamic range, voxel size, position, field of view as well as anatomical appearance. For each image, all its 2D axial slices were resized to be of dimension $256\times 256$ and operated by histogram equalization using the contrast limited adaptive histogram equalization (CLAHE) algorithm [12]. Gaussian normalization was employed to normalize each 2D image to obtain zero mean and unit variance. Data augmentation was conducted to enlarge the training dataset. The augmentation operations included rotation, flip and zoom in the axial plane.

2.3 Formulation

We use $S=\{(X_{n},Y_{n}),$ $n=1,\cdots,N\}$ to represent the training dataset, where $X_{n}=\{x_{n}^{j},$ $j=1,\cdots,M\}$ denotes the preprocessed axial slices and $Y_{n}=\{y_{n}^{j},$ $j=1,\cdots,M\}$ denotes the corresponding segmentation ground truth (binary masks) of the $n^{th}$ training image. In our setting, $N=15000$ and $M=256^{2}$ . For simplicity, we denote all parameters in the designed CNN as $W$ and the predicted labels as $\mathscr{Z}(W,X_{n})$ . The objective function is,

[TABLE]

where $W^{*}$ is the optimal weights obtained from the training procedure, $L(\mathscr{Z}(W,X_{n}),Y_{n})$ is the Dice loss [13] considering the sample sizes of the two classes (the prostate and the background) are highly unbalanced. Let $\{z_{n}^{j},$ $j=1,\cdots,M\}$ be the pixel value (0 or 1) of $\mathscr{Z}(W,X_{n})$ , and the Dice loss function can be expressed as

[TABLE]

where $s$ is used to avoid a situation wherein the denominator is 0, i.e., the pixel values of $\mathscr{Z}(W,X_{n})$ and $Y_{n}$ are all zeros.

In the testing stage, the predicted mask for image $X_{pred}$ is obtained as

[TABLE]

Finally, $Y_{pred}$ is binarized at a threshold of 0.5.

2.4 Implementation

The proposed network was implemented based on Keras using the TensorFlow backend. All training and testing experiments were conducted on a workstation equipped with NVIDIA GTX 1080 Ti. The networks was trained with a batch size of 8 due to the limited capacity of GPU memory. Adam optimizer was used and the learning rate was set to be 0.001. The standard image size was set to be $256\times 256$ .

3 RESULTS AND DISCUSSION

There are 50 data with ground truth. And we divided them into 45 training data and 5 validation data. The validation data were identified to be the images of indices $\{05,15,25,35,45\}$ .

3.1 Approaches to make images of uniform size

As shown in Table. 1, the voxel size and image size vary from image to image. Given that the input to a CNN should be of uniform size, we tested three approaches to make the images of uniform size, including 1) pad and cut, 2) 2D resize and 3) 3D resize. To compare the performance of unifying uniform size methods, we did simulation and Z-net segmentation experiments and the results are summarized in Table. 2. In the simulation experiment, we performed the aforementioned three methods on the validation data to make the images of uniform size and then reconstruct them to the original size. In the Z-net segmentation experiment, we resized the training data using the three methods and then used the resized data to train three Z-nets. Then we predicted the masks for the validation data using the trained Z-nets. Finally, the predicted masks were reconstructed to the original size. In summary, the simulation results reflect the interpolation accuracy and Z-net segmentation results reflect the overall segmentation accuracy.

The first method crops the image boundaries in the preprocessing step and pad back in the reconstruction stage. As shown in Table. 2, for this “pad and cut” method, the mean volumetric Dice Similarity Coefficient (vDSC) is 100 $\%$ for simulation results and 85.14 $\%$ for Z-net based testing. For the 2D resize method, we used sampling and nearest neighbor interpolation in both the preprocessing and reconstruction steps. For the 3D resize method, we resampled the data to be isotropic $0.5\times 0.5\times 0.5$ mm3 and then operated 2D resize. The simulation results of these two resize methods are not good as that of the first method since interpolation is used. However, as shown in Table. 2, the Z-net based result is the best for the 2D resize method. As such, we utilized 2D resize to be our method for making all data of uniform size.

3.2 Automatic prostate segmentation

The segmentation results of the proposed approach on representative central slices from the 5 validation data are shown in Fig. 3, where the automatically obtained prostate boundary is highlighted in green and the ground truth boundary is marked in red. The automatically obtained boundaries are very near to those of the ground truth in most cases, despite mismatches locations. The inaccuracies are likely due to the similar intensity profiles of soft tissues adjacent to the prostate, which may result in both false positive and false negative.

Table. 3 collates the mean and standard deviations of the vDSC, Hausdorff distance (HD), relative absolute volume difference (RAVD) obtained from U-net [12], Ensemble DCNN [9] and the proposed approach. Please note, these results were delivered by the PROMISE 12 challenge organizers. Evidently, the proposed approach has the highest segmentation accuracy and lowest standard deviation among all the three methods under comparison.

4 CONCLUSION

In this paper, we proposed and validated a novel architecture Z-net for the automatic prostate segmentation. The proposed network has more layers through concatenation and dense connection, which is different from U-net. The proposed Z-net is capable of preserving more location information, which is quite useful for identifying the boundary between the prostate and the surrounding soft tissues. It is worthy of noting that the proposed strategy expands feature maps at different levels whereas the number of feature maps keep the same in the residual block, which are quite different. In addition, we revealed that 2D resize is a reliable way to make images of uniform size. The proposed Z-net was evaluated qualitatively on the 5 validation data and quantitatively on 30 testing data, from which the effectiveness of Z-net had been validated. The proposed Z-net is densely connected, occupying more GPU memory than U-net. This largely limits the batch size in the network, which may impair the network performance. In the future, we will try to reduce the redundant connections in Z-net and make it more efficient at no cost of the segmentation accuracy.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Kimberly D Miller, Rebecca L Siegel, Chun Chieh Lin, et al., “Cancer treatment and survivorship statistics, 2016,” CA: a cancer journal for clinicians , vol. 66, no. 4, pp. 271–289, 2016.
2[2] CG Roehrborn, “Pathology of benign prostatic hyperplasia,” International journal of impotence research , vol. 20, no. S 3, pp. S 11, 2008.
3[3] Soumya Ghose, Arnau Oliver, Robert Martí, Xavier Lladó, et al., “A survey of prostate segmentation methodologies in ultrasound, magnetic resonance and computed tomography images,” Computer methods and programs in biomedicine , vol. 108, no. 1, pp. 262–287, 2012.
4[4] Tony F Chan and Luminita A Vese, “Active contours without edges,” IEEE Transactions on image processing , vol. 10, no. 2, pp. 266–277, 2001.
5[5] Zhiqiang Tian, Lizhi Liu, Zhenfeng Zhang, and Baowei Fei, “Superpixel-based segmentation for 3D prostate MR images,” IEEE transactions on medical imaging , vol. 35, no. 3, pp. 791–801, 2016.
6[6] Thomas Robin Langerak, Uulke A van der Heide, Alexis NTJ Kotte, et al., “Label fusion in atlas-based segmentation using a selective and iterative method for performance level estimation,” IEEE transactions on medical imaging , vol. 29, no. 12, pp. 2000–2008, 2010.
7[7] Lequan Yu, Xin Yang, Hao Chen, et al., “Volumetric convnets with mixed residual connections for automated prostate segmentation from 3D MR images.,” in AAAI , 2017, pp. 66–72.
8[8] Qikui Zhu, Bo Du, Baris Turkbey, et al., “Deeply-supervised CNN for prostate segmentation,” in Neural Networks (IJCNN), 2017 International Joint Conference on . IEEE, 2017, pp. 178–184.