Learning More with Less: Conditional PGGAN-based Data Augmentation for   Brain Metastases Detection Using Highly-Rough Annotation on MR Images

Changhee Han; Kohei Murao; Tomoyuki Noguchi; Yusuke Kawata; Fumiya; Uchiyama; Leonardo Rundo; Hideki Nakayama; Shin'ichi Satoh

arXiv:1902.09856·cs.CV·August 23, 2019

Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images

Changhee Han, Kohei Murao, Tomoyuki Noguchi, Yusuke Kawata, Fumiya, Uchiyama, Leonardo Rundo, Hideki Nakayama, Shin'ichi Satoh

PDF

TL;DR

This paper introduces a novel conditional GAN-based data augmentation method that uses rough annotations to improve brain metastases detection in MR images, significantly enhancing diagnostic sensitivity.

Contribution

It presents the first GAN-based medical data augmentation technique that incorporates rough bounding box annotations to improve tumor detection robustness.

Findings

01

Boosted detection sensitivity by 10%

02

Generated highly realistic tumor images indistinguishable from real MR images

03

Additional normal images did not improve detection performance

Abstract

Accurate Computer-Assisted Diagnosis, associated with proper data wrangling, can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle the small/fragmented medical imaging datasets collected from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. However, we cannot easily use them to locate disease areas, considering expert physicians' expensive annotation cost. Therefore, this paper proposes Conditional Progressive Growing of GANs (CPGGANs), incorporating highly-rough bounding box conditions incrementally into PGGANs to place brain metastases at desired positions/sizes on 256 X 256 Magnetic Resonance (MR) images, for Convolutional…

Tables2

Table 1. Table 1. YOLOv3 brain metastases detection results with/without DA, using bounding boxes with detection threshold 0.1%.

	IoU $\geq$ 0.5		IoU $\geq$ 0.25
	Sensitivity	FPs per slice	Sensitivity	FPs per slice
2,813 real images	0.67	4.11	0.83	3.59
+ 4,000 CPGGAN-based DA	0.77	7.64	0.91	7.18
+ 8,000 CPGGAN-based DA	0.71	6.36	0.87	5.85
+ 12,000 CPGGAN-based DA	0.76	11.77	0.91	11.29
+ 4,000 CPGGAN-based DA (+ normal)	0.69	7.16	0.86	6.60
+ 8,000 CPGGAN-based DA (+ normal)	0.73	8.10	0.89	7.59
+ 12,000 CPGGAN-based DA (+ normal)	0.74	9.42	0.89	8.95
+ 4,000 Image-to-Image GAN-based DA	0.72	6.21	0.87	5.70
+ 8,000 Image-to-Image GAN-based DA	0.68	3.50	0.84	2.99
+ 12,000 Image-to-Image GAN-based DA	0.74	7.20	0.89	6.72

Table 2. Table 2. Visual Turing Test results by three physicians for classifying real vs CPGGAN-generated images: (a), (b) Test 1, 2: resized 32 × 32 32 32 32\times 32 tumor bounding boxes, trained without/with additional normal brain images; (c), (d) Test 3, 4: 256 × 256 256 256 256\times 256 MR images, trained without/with normal brain images. Accuracy denotes the physicians’ successful classification ratio between the real/synthetic images.

		Accuracy	Real Selected as Real	Real as Synt	Synt as Real	Synt as Synt
Test 1	Physician1	88%	40	10	2	48
	Physician2	95%	45	5	0	50
	Physician3	97%	49	1	2	48
Test 2	Physician1	81%	39	11	8	42
	Physician2	83%	43	7	10	40
	Physician3	91%	45	5	4	46
Test 3	Physician1	97%	47	3	0	50
	Physician2	96%	46	4	0	50
	Physician3	100%	50	0	0	50
Test 4	Physician1	91%	41	9	0	50
	Physician2	96%	48	2	2	48
	Physician3	100%	50	0	0	50

Equations8

\tilde{y} \sim P_{g} E [D (\tilde{y})] - y \sim P_{r} E [D (y)] + λ \overset{y}{^} \sim P_{\overset{y}{^}} E [(∥ \nabla_{\overset{y}{^}} D (\overset{y}{^}) ∥_{2} - 1)^{2}]

\tilde{y} \sim P_{g} E [D (\tilde{y})] - y \sim P_{r} E [D (y)] + λ \overset{y}{^} \sim P_{\overset{y}{^}} E [(∥ \nabla_{\overset{y}{^}} D (\overset{y}{^}) ∥_{2} - 1)^{2}]

λ_{coord} i = 0 \sum S^{2} j = 0 \sum B \mathbbm 1_{ij}^{obj} [(x_{i} - \overset{x}{^}_{i})^{2} + (y_{i} - \overset{y}{^}_{i})^{2}]

λ_{coord} i = 0 \sum S^{2} j = 0 \sum B \mathbbm 1_{ij}^{obj} [(x_{i} - \overset{x}{^}_{i})^{2} + (y_{i} - \overset{y}{^}_{i})^{2}]

+ λ_{coord} i = 0 \sum S^{2} j = 0 \sum B \mathbbm 1_{ij}^{obj} [(w_{i} - \overset{w}{^}_{i})^{2} + (h_{i} - \hat{h}_{i})^{2}]

\displaystyle+\leavevmode\resizebox{195.12767pt}{}{$\sum_{i=0}^{S^{2}}\sum_{j=0}^{B}\mathbbm{1}_{ij}^{\text{obj}}(C_{i}-\hat{C}_{i})^{2}$}+\leavevmode\resizebox{195.12767pt}{}{$\lambda_{\text{noobj}}\sum_{i=0}^{S^{2}}\sum_{j=0}^{B}\mathbbm{1}_{ij}^{\text{noobj}}(C_{i}-\hat{C}_{i})^{2}$}

\displaystyle+\leavevmode\resizebox{195.12767pt}{}{$\sum_{i=0}^{S^{2}}\sum_{j=0}^{B}\mathbbm{1}_{ij}^{\text{obj}}(C_{i}-\hat{C}_{i})^{2}$}+\leavevmode\resizebox{195.12767pt}{}{$\lambda_{\text{noobj}}\sum_{i=0}^{S^{2}}\sum_{j=0}^{B}\mathbbm{1}_{ij}^{\text{noobj}}(C_{i}-\hat{C}_{i})^{2}$}

+ i = 0 \sum S^{2} \mathbbm 1_{i}^{obj} c \in classes \sum (p_{i} (c) - \overset{p}{^}_{i} (c))^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Learning More with Less: Conditional PGGAN-based

Data Augmentation for Brain Metastases Detection Using

Highly-Rough Annotation on MR Images

Changhee Han1,2,3,4

,

Kohei Murao1,2

[email protected]

1Research Center for Medical Big Data,

National Institute of InformaticsTokyoJapan

,

Tomoyuki Noguchi2

,

Yusuke Kawata2

,

Fumiya Uchiyama2

2Department of Radiology, National Center for

Global Health and MedicineTokyoJapan

,

Leonardo Rundo3

3Department of Radiology,

University of CambridgeCambridgeUK

,

Hideki Nakayama4

and

Shin’ichi Satoh1,4

4Graduate School of Information Science and Technology, The University of TokyoTokyoJapan

(2019)

Abstract.

Accurate Computer-Assisted Diagnosis, associated with proper data wrangling, can alleviate the risk of overlooking the diagnosis in a clinical environment. Towards this, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize additional training data to handle the small/fragmented medical imaging datasets collected from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. However, we cannot easily use them to locate disease areas, considering expert physicians’ expensive annotation cost. Therefore, this paper proposes Conditional Progressive Growing of GANs (CPGGANs), incorporating highly-rough bounding box conditions incrementally into PGGANs to place brain metastases at desired positions/sizes on $256\times 256$ Magnetic Resonance (MR) images, for Convolutional Neural Network-based tumor detection; this first GAN-based medical DA using automatic bounding box annotation improves the training robustness. The results show that CPGGAN-based DA can boost $10\%$ sensitivity in diagnosis with clinically acceptable additional False Positives. Surprisingly, further tumor realism, achieved with additional normal brain MR images for CPGGAN training, does not contribute to detection performance, while even three physicians cannot accurately distinguish them from the real ones in Visual Turing Test.

Generative Adversarial Networks, Medical Image Augmentation, Conditional PGGANs, Brain Tumor Detection, MRI

††journalyear: 2019††copyright: acmcopyright††conference: The 28th ACM International Conference on Information and Knowledge Management; November 3–7, 2019; Beijing, China††booktitle: The 28th ACM International Conference on Information and Knowledge Management (CIKM’19), November 3–7, 2019, Beijing, China††price: 15.00††doi: 10.1145/3357384.3357890††isbn: 978-1-4503-6976-3/19/11††ccs: Computing methodologies Object detection††ccs: Applied computing Health informatics

1. Introduction

Accurate Computer-Assisted Diagnosis (CAD) with high sensitivity can alleviate the risk of overlooking the diagnosis in a clinical environment. Specifically, Convolutional Neural Networks (CNNs) have revolutionized medical imaging, such as diabetic eye disease diagnosis (Gulshan et al., 2016), mainly thanks to large-scale annotated training data. However, obtaining such annotated medical big data is demanding; thus, better diagnosis requires intensive Data Augmentation (DA) techniques, such as geometric/intensity transformations of original images (Ronneberger et al., 2015; Milletari et al., 2016). Yet, those augmented images intrinsically have a similar distribution to the original ones, leading to limited performance improvement; in this context, Generative Adversarial Network (GAN) (Goodfellow et al., 2014)-based DA can boost the performance by filling the real image distribution uncovered by the original dataset, since it generates realistic but completely new samples showing good generalization ability; GANs achieved outstanding performance in computer vision, including $21\%$ performance improvement in eye-gaze estimation (Shrivastava et al., 2017).

Also in medical imaging, where the primary problem lies in small and fragmented imaging datasets from various scanners (Rundo et al., 2019), GAN-based DA performs effectively: researchers improved classification by augmentation with noise-to-image GANs (e.g., random noise samples to diverse pathological images) (Frid-Adar et al., 2018) and segmentation with image-to-image GANs (e.g., a benign image with a pathology-conditioning image to a malignant one) (Shin et al., 2018; Jin et al., 2018). Such applications include $256\times 256$ brain Magnetic Resonance (MR) image generation for tumor/non-tumor classification (Han et al., 2019). Nevertheless, unlike bounding box-based object detection, simple classification cannot locate disease areas and rigorous segmentation requires physicians’ expensive annotation.

So, how can we achieve high sensitivity in diagnosis using GANs with minimum annotation cost, based on highly-rough and inconsistent bounding boxes? As an advanced data wrangling approach, we aim to generate GAN-based realistic and diverse $256\times 256$ brain MR images with brain metastases at desired positions/sizes for accurate CNN-based tumor detection; this is clinically valuable for better diagnosis, prognosis, and treatment, since brain metastases are the most common intra-cranial tumors, getting prevalent as oncological therapies improve cancer patients’ survival (Arvold et al., 2016). Conventional GANs cannot generate realistic $256\times 256$ whole brain MR images conditioned on tumor positions/sizes under limited training data/highly-rough annotation (Han et al., 2019); since noise-to-image GANs cannot directly be conditioned on an image describing desired objects, we have to use image-to-image GANs (e.g., input both the conditioning image/random noise samples or the conditioning image alone with dropout noises (Srivastava et al., 2014) on a generator (Isola et al., 2017))—it results in unrealistic high-resolution MR images with odd artifacts due to the limited training data/rough annotation, tumor variations, and strong consistency in brain anatomy, unless we also input a benign image sacrificing image diversity.

Such a high-resolution whole image generation approach, not involving Regions of Interest (ROIs) alone, however, could facilitate detection because it provides more image details and most CNN architectures adopt around $256\times 256$ input pixels. Therefore, as a conditional noise-to-image GAN not relying on an input benign image, we propose Conditional Progressive Growing of GANs (CPGGANs), incorporating highly-rough bounding box conditions incrementally into PGGANs (Karras et al., 2018a) to naturally place tumors of random shape at desired positions/sizes on MR images. Moreover, we evaluate the generated images’ realism via Visual Turing Test (Salimans et al., 2016) by three expert physicians, and visualize the data distribution via the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm (van der Maaten and Hinton, 2008). Using the synthetic images, our novel CPGGAN-based DA boosts $10\%$ sensitivity in diagnosis with clinically acceptable additional False Positives (FPs). Surprisingly, we confirm that further realistic tumor appearance, judged by the physicians, does not contribute to detection performance.

Research Questions. We mainly address two questions:

•

PGGAN Conditioning: How can we modify PGGANs to naturally place objects of random shape, unlike rigorous segmentation, at desired positions/sizes based on highly-rough bounding box masks?

•

Medical Data Augmentation: How can we balance the number of real and additional synthetic training data to achieve the best detection performance?

Contributions. Our main contributions are as follows:

•

Conditional Image Generation: As the first bounding box-based $256\times 256$ whole pathological image generation approach, CPGGANs can generate realistic/diverse images with objects naturally at desired positions/sizes; the generated images can play a vital role in clinical oncology applications, such as DA, data anonymization, and physician training.

•

Misdiagnosis Prevention: This study allows us to achieve high sensitivity in automatic CAD using small/fragmented medical imaging datasets with minimum annotation efforts based on highly-rough/inconsistent bounding boxes.

•

Brain Metastases Detection: This first bounding box-based brain metastases detection method successfully detects tumors exploiting CPGGAN-based DA.

2. Generative Adversarial Networks

In terms of realism and diversity, GANs (Goodfellow et al., 2014) have shown great promise in image generation (Ledig et al., 2017; Karras et al., 2018b) through a two-player minimax game. However, the two-player objective function triggers difficult training, accompanying artifacts and mode collapse (Gulrajani et al., 2017) when generating high-resolution images, such as $256\times 256$ ones (Radford et al., 2016); to tackle this, multi-stage generative training methods have been proposed: AttnGAN uses attention-driven multi-stage refinement for fine-grained text-to-image generation (Xu et al., 2018); PGGANs adopts incremental training procedures from low to high resolution for generating a realistic image (Karras et al., 2018a). Moreover, GAN-based $128\times 128$ conditional image synthesis using a bounding box can control generated images’ local properties (Reed et al., 2016). GANs can typically generate more realistic images than other common deep generative models, such as variational autoencoders (Kingma and Welling, 2013) suffering from the injected noise and imperfect reconstruction due to a single objective function (Mescheder et al., 2017); thus, as a DA technique, most researchers chose GANs for facilitating classification (Antoniou et al., 2017; Mariani et al., 2018), object detection (Ouyang et al., 2018; Huang et al., 2018), and segmentation (Zhu et al., 2018) to tackle the lack of training data.

This GAN-based DA trend especially applies to medical imaging for handling various types of small/fragmented datasets from multiple scanners: researchers used noise-to-image GANs for improving classification on brain tumor/non-tumor MR (Han et al., 2019) and liver lesion Computed Tomography (CT) images (Frid-Adar et al., 2018); others used image-to-image GANs focusing on ROI (i.e., small pathological areas) for improving segmentation on 3D brain tumor MR (Shin et al., 2018) and 3D lung nodule CT images (Jin et al., 2018).

However, to the best of our knowledge, our work is the first GAN-based medical DA method using automatic bounding box annotation, despite 2D bounding boxes’ cheap annotation cost compared with rigorous 3D segmentation. Moreover, unlike the ROI DA work generating only pedestrians without the background for pedestrian detection (Ouyang et al., 2018), this is the first GAN-based whole image augmentation approach including the background, relying on bounding boxes, in computer vision. Along with classic transformations of real images, a completely different approach—generating novel whole $256\times 256$ brain MR images with tumors at desired positions/sizes using CPGGANs—may become a clinical breakthrough in terms of annotation cost.

3. Materials and Methods

3.1. Brain Metastases Dataset

As a new dataset for the first bounding box-based brain metastases detection, this paper uses a dataset of contrast-enhanced T1-weighted (T1c) brain axial MR images, collected by the authors (National Center for Global Health and Medicine, Tokyo, Japan) and currently not publicly available for ethical restrictions; for robust clinical applications, it contains $180$ brain metastatic cancer cases from multiple MRI scanners—those images differ in contrast, magnetic field strength (i.e., $1.5$ T, $3.0$ T), and matrix size (i.e., $190\times 224$ , $216\times 256$ , $256\times 256$ , $460\times 460$ pixels). In the clinical practice, T1c MRI is well-established in brain metastases detection thanks to its high-contrast in the enhancing region. We also use additional brain MR images from $193$ normal subjects only for CPGGAN training, not in tumor detection, to confirm the effect of combining the normal and pathological images for training.

3.2. CPGGAN-based Image Generation

Data Preparation For tumor detection, our whole brain metastases dataset ( $180$ patients) is divided into: (i) a training set ( $126$ patients); (ii) a validation set ( $18$ patients); (iii) a test set ( $36$ patients); only the training set is used for GAN training to be fair. Our experimental dataset consists of:

•

Training set ( $2,813$ images/ $5,963$ bounding boxes);

•

Validation set ( $337$ images/ $616$ bounding boxes);

•

Test set ( $947$ images/ $3,094$ bounding boxes).

Our training set is relatively small/fragmented for CNN-based applications, considering that the same patient’s tumor slices could convey very similar information. To confirm the effect of realism and diversity—provided by combining PGGANs and bounding box conditioning—on tumor detection, we compare the following GANs: (i) CPGGANs trained only with the brain metastases images; (ii) CPGGANs trained also with additional $16,962$ brain images from $193$ normal subjects; (iii) Image-to-image GAN trained only with the brain metastases images. After skull-stripping on all images with various resolution, remaining brain parts are cropped and resized to $256\times 256$ pixels (i.e., a power of $2$ for better GAN training). As Fig. 2 shows, we lazily annotate tumors with highly-rough and inconsistent bounding boxes to minimize expert physicians’ labor.

CPGGANs is a novel conditional noise-to-image training method for GANs, incorporating highly-rough bounding box conditions incrementally into PGGANs (Karras et al., 2018a), unlike conditional image-to-image GANs requiring rigorous segmentation masks (Bailo et al., 2019). The original PGGANs exploits a progressively growing generator and discriminator: starting from low-resolution, newly-added layers model fine-grained details as training progresses. As Fig. 3 shows, we further condition the generator and discriminator to generate realistic and diverse $256\times 256$ brain MR images with tumors of random shape at desired positions/sizes using only bounding boxes without an input benign image under limited training data/highly-rough annotation. Our modifications to the original PGGANs are as follows:

•

Conditioning image: prepare a $256\times 256$ black image (i.e., pixel value: [math]) with white bounding boxes (i.e., pixel value: $255$ ) describing tumor positions/sizes for attention;

•

Generator input: resize the conditioning image to the previous generator’s output resolution/channel size and concatenate them (noise samples generate the first $4\times 4$ images);

•

Discriminator input: concatenate the conditioning image with a real or synthetic image.

CPGGAN Implementation Details We use the CPGGAN architecture with the Wasserstein loss using gradient penalty (Gulrajani et al., 2017):

[TABLE]

where the discriminator $D$ belongs to the set of $1$ -Lipschitz functions, $\mathbb{P}_{r}$ is the data distribution by the true data sample ${y}$ , and $\mathbb{P}_{g}$ is the model distribution by the synthetic sample ${\tilde{y}}$ generated from the conditioning image noise samples using uniform distribution in $[-1,1]$ . The last term is gradient penalty for the random sample ${\hat{y}}\sim{\mathbb{P}_{\hat{y}}}$ .

Training lasts for $3,000,000$ steps with a batch size of $4$ and $2.0\times 10^{-4}$ learning rate for the Adam optimizer (Kingma and Ba, 2014). We flip the discriminator’s real/synthetic labels once in three times for robustness. During testing, as tumor attention images, we use the annotation of training images with a random combination of horizontal/vertical flipping, width/height shift up to $10\%$ , and zooming up to $10\%$ ; these CPGGAN-generated images are used as additional training images for tumor detection.

Image-to-image GAN is a conventional conditional GAN that generates brain MR images with tumors, concatenating a $256\times 256$ conditioning image with noise samples for a generator input and concatenating the conditioning image with a real/synthetic image for a discriminator input, respectively. It uses a U-Net-like (Ronneberger et al., 2015) generator with $4$ convolutional/deconvolutional layers in encoders/decoders respectively with skip connections, along with a discriminator with $3$ decoders. We apply batch normalization (Ioffe and Szegedy, 2015) to both convolution with LeakyReLU and deconvolution with ReLU. It follows the same implementation details as for the CPGGANs.

3.3. YOLOv3-based Brain Metastases Detection

You Only Look Once v3 (YOLOv3) (Redmon and Farhadi, 2018) is a fast and accurate CNN-based object detector: unlike conventional classifier-based detectors, it divides the image into regions and predicts bounding boxes/probabilities for each region. We adopt YOLOv3 to detect brain metastases on MR images since its high efficiency can play a clinical role in real-time tumor alert; moreover, it shows very comparable results with $608\times 608$ network resolution against other state-of-the-art detectors, such as Faster RCNN (Ren et al., 2015).

To confirm the effect of GAN-based DA, the following detection results are compared: (i) $2,813$ real images without DA, (ii), (iii), (iv) with $4,000$ / $8,000$ / $12,000$ CPGGAN-based DA, (v), (vi), (vii) with $4,000$ / $8,000$ / $12,000$ CPGGAN-based DA, trained with additional normal brain images, (viii), (ix), (x) with $4,000$ / $8,000$ / $12,000$ image-to-image GAN-based DA. Due to the risk of overlooking the diagnosis $via$ medical imaging, higher sensitivity matters more than less FPs; thus, we aim to achieve higher sensitivity with a clinically acceptable number of FPs, adding the additional synthetic training images. Since our annotation is highly-rough, we calculate sensitivity/FPs per slice with both Intersection over Union (IoU) threshold 0.5 and 0.25. For better DA, GAN-generated images with unclear tumor appearance are manually discarded.

YOLOv3 Implementation Details We use the YOLOv3 architecture with Darknet-53 as a backbone classifier and sum squared error between the predictions/ground truth as a loss:

[TABLE]

where $x_{i},y_{i}$ are the centroid location of an anchor box, $w_{i},h_{i}$ are the width/height of the anchor, $C_{i}$ is the Objectness (i.e., confidence score of whether an object exists), and $p_{i}(c)$ is the classification loss. Let $S^{2}$ and $B$ be the size of a feature map and the number of anchor boxes, respectively. $\mathbbm{1}_{i}^{\text{obj}}$ is $1$ when an object exists in cell $i$ and otherwise [math].

During training, we use a batch size of $64$ and $1.0\times 10^{-3}$ learning rate for the Adam optimizer. The network resolution is set to $416\times 416$ pixels during training and $608\times 608$ pixels during validation/testing respectively to detect small tumors better. We recalculate the anchors at each DA setup. As classic DA, geometric/intensity transformations are also applied to both real/synthetic images during training to achieve the best performance. For testing, we pick the model with the best sensitivity on validation with detection threshold 0.1%/IoU threshold 0.5 between $96,000$ - $240,000$ steps to avoid severe FPs while achieving high sensitivity.

3.4. Clinical Validation via Visual Turing Test

To quantitatively evaluate how realistic the CPGGAN-based synthetic images are, we supply, in random order, to three expert physicians a random selection of $50$ real and $50$ synthetic brain metastases images. They take four tests in ascending order: (i), (ii) test 1, 2: real vs CPGGAN-generated resized $32\times 32$ tumor bounding boxes, trained without/with additional normal brain images; (iii), (iv) test 3, 4: real vs CPGGAN-generated $256\times 256$ MR images, trained without/with additional normal brain images.

Then, the physicians are asked to constantly classify them as real/synthetic, if needed, zooming/rotating them, without previous training stages revealing which is real/synthetic. Such Visual Turing Test (Salimans et al., 2016) can probe the human ability to identify attributes/relationships in images, also in evaluating GAN-generated images’ appearance (Shrivastava et al., 2017). This similarly applies to medical images in a clinical environment, wherein physicians’ specialty is critical (Han et al., 2018; Frid-Adar et al., 2018).

3.5. Visualization via t-SNE

To visually analyze the distribution of real/synthetic images, we use t-SNE (van der Maaten and Hinton, 2008) on a random selection of:

•

$500$ real tumor images;

•

$500$ CPGGAN-generated tumor images;

•

$500$ CPGGAN-generated tumor images, trained with additional normal brain images.

We normalize the input images to $[0,1]$ .

T-SNE is a machine learning algorithm for dimensionality reduction to represent high-dimensional data into a lower-dimensional (2D/3D) space. It non-linearly adapts to input data using perplexity to balance between the data’s local and global aspects.

t-SNE Implementation Details We use t-SNE with a perplexity of $100$ for $1,000$ iterations to get a 2D representation.

4. Results

This section shows how CPGGANs and image-to-image GAN generate brain MR images. The results include instances of synthetic images and their influence on tumor detection, along with CPGGAN-generated images’ evaluation via Visual Turing Test and t-SNE.

4.1. MR Images Generated by CPGGANs

Fig. 4 illustrates example GAN-generated images. CPGGANs successfully captures the T1c-specific texture and tumor appearance at desired positions/sizes. Since we use highly-rough bounding boxes, the synthetic tumor shape largely varies within the boxes. When trained with additional normal brain images, it clearly maintains the realism of the original images with less odd artifacts, including tumor bounding boxes, which the additional images do not include. However, as expected, image-to-image GAN, without progressive growing, generates clearly unrealistic images without an input benign image due to the limited training data/rough annotation.

4.2. Brain Metastases Detection Results

Table 1 shows the tumor detection results with/without GAN-based DA. As expected, the sensitivity remarkably increases with the additional synthetic training data while FPs per slice also increase. Adding more synthetic images generally leads to a higher amount of FPs, also detecting blood vessels that are small/hyper-intense on T1c MR images, very similarly to the enhanced tumor regions (i.e., the contrast agent perfuses throughout the blood vessels). However, surprisingly, adding only $4,000$ CPGGAN-generated images achieves the best sensitivity improvement by $0.10$ with IoU threshold $0.5$ and by $0.08$ with IoU threshold $0.25$ , probably due to the real/synthetic training image balance—the improved training robustness achieves sensitivity $0.91$ with moderate IoU threshold $0.25$ despite our highly-rough bounding box annotation.

Fig. 5 also visually indicates that it can alleviate the risk of overlooking the tumor diagnosis with clinically acceptable FPs; in the clinical routine, the bounding boxes, highly-overlapping around tumors, only require a physician’s single check by switching on/off transparent alpha-blended annotation on MR images. It should be noted that we cannot increase FPs to achieve such high sensitivity without CPGGAN-based DA. Moreover, our results reveal that further realism—associated with the additional normal brain images during training—does not contribute to detection performance, possibly as the training focuses less on tumor generation. Image-to-image GAN-based DA just moderately facilitates detection with less additional FPs, probably because the synthetic images have a distribution far from the real ones and thus their influence on detection is limited during testing.

4.3. Visual Turing Test Results

Table 2 shows the confusion matrix for the Visual Turing Test. The expert physicians easily recognize $256\times 256$ synthetic images due to the lack of training data. However, when CPGGANs is trained with additional normal brain images, the experts classify a considerable number of synthetic tumor bounding boxes as real; it implies that the additional normal images remarkably facilitate the realism of both healthy and pathological brain parts while they do not include abnormality; thus, CPGGANs might perform as a tool to train medical students and radiology trainees when enough medical images are unavailable, such as abnormalities at rare positions/sizes. Such GAN applications are clinically prospective (Finlayson et al., 2018), considering the expert physicians’ positive comments about the tumor realism.

4.4. T-SNE Results

As presented in Fig. 6, synthetic tumor bounding boxes have a moderately similar distribution to real ones, but they also fill the real image distribution uncovered by the original dataset, implying their effective DA performance; especially, the CPGGAN-generated images trained without normal brain images distribute wider than the center-concentrating images trained with the normal brain images. Meanwhile, real/synthetic whole brain images clearly distribute differently, due to the real MR images’ strong anatomical consistency (Fig. 7). Considering the achieved high DA performance, the tumor (i.e., ROI) realism/diversity matter more than the whole image realism/diversity, since YOLOv3 look at an image patch instead of a whole image, similarly to most other CNN-based object detectors.

5. Conclusion

Without relying on an input benign image, our CPGGANs can generate realistic and diverse $256\times 256$ MR images with brain metastases of random shape, unlike rigorous segmentation, naturally at desired positions/sizes, and achieve high sensitivity in tumor detection—even with small/fragmented training data from multiple MRI scanners and lazy annotation using highly-rough bounding boxes; in the context of intelligent data wrangling, this attributes to the CPGGANs’ good generalization ability to incrementally synthesize conditional whole images with the real image distribution unfilled by the original dataset, improving the training robustness.

We confirm that the realism and diversity of the generated images, judged by three expert physicians $via$ Visual Turing Test, do not imply better detection performance; as the t-SNE results show, the CPGGAN-generated images, trained with additional non-tumor normal images, lack diversity probably because the training less focuses on tumors. Moreover, we notice that adding over-sufficient synthetic images leads to more FPs, but not always higher sensitivity, possibly due to the training data imbalance between real and synthetic images; as the t-SNE results reveal, the CPGGAN-generated tumor bonding boxes have a moderately similar—mutually complementary—distribution to the real ones; thus, GAN-overwhelming training images may decrease the necessary influence of the real samples and harm training, rather than providing robustness. Lastly, image-to-image GAN-based DA just moderately facilitates detection with less additional FPs, probably due to the lack of realism. However, further investigations are needed to maximize the effect of the CPGGAN-based medical image augmentation.

For example, we could verify the effect of further realism in return for less diversity by combining $\ell_{1}$ loss with the Wasserstein loss using gradient penalty for GAN training. We can also combine those CPGGAN-generated images, trained without/with additional brain images, similarly to ensemble learning (Dietterich, 2002). Lastly, we plan to define a new GAN loss function that directly optimizes the detection results, instead of realism, similarly to the three-player GAN for optimizing classification results (Vandenhende et al., 2019).

Overall, minimizing expert physicians’ annotation efforts, our novel CPGGAN-based DA approach sheds light on diagnostic and prognostic medical applications, not limited to brain metastases detection; future studies, especially on 3D bounding box detection with highly-rough annotation, are required to extend our promising results. Along with the DA, the CPGGANs has other potential clinical applications in oncology: (i) A data anonymization tool to share patients’ data outside their institution for training while preserving detection performance. Such a GAN-based application is reported in Shin et al. (Shin et al., 2018); (ii) A physician training tool to display random synthetic medical images with abnormalities at both common and rare positions/sizes, by training CPGGANs on highly unbalanced medical datasets (i.e., limited pathological and abundant normal samples, respectively). It can help train medical students and radiology trainees despite infrastructural and legal constraints (Finlayson et al., 2018).

Acknowledgments

This research was supported by AMED Grant Number JP18lk1010028.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Antoniou et al . (2017) Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. ar Xiv preprint ar Xiv:1711.04340 (2017).
3Arvold et al . (2016) Nils D Arvold, Eudocia Q Lee, Minesh P Mehta, Kim Margolin, Brian M Alexander, et al . 2016. Updates in the management of brain metastases. Neuro Oncol. 18, 8 (2016), 1043–1065.
4Bailo et al . (2019) Oleksandr Bailo, Dong Shik Ham, and Young Min Shin. 2019. Red blood cell image generation for data augmentation using conditional generative adversarial networks. ar Xiv preprint ar Xiv:1901.06219 (2019).
5Dietterich (2002) Thomas G Dietterich. 2002. Ensemble learning. The Handbook of Brain Theory and Neural Networks 2 (2002), 110–125.
6Finlayson et al . (2018) Samuel G Finlayson, Hyunkwang Lee, Isaac S Kohane, and Luke Oakden-Rayner. 2018. Towards generative adversarial networks as a new paradigm for radiology education. In Proc. Machine Learning for Health (ML 4H) Workshop ar Xiv:1812.01547 .
7Frid-Adar et al . (2018) Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. 2018. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321 (2018), 321–331.
8Goodfellow et al . (2014) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, et al . 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS) . 2672–2680.