Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection
Changhee Han, Yoshiro Kitamura, Akira Kudo, Akimichi Ichinose,, Leonardo Rundo, Yujiro Furukawa, Kazuki Umemoto, Yuanzhong Li, Hideki, Nakayama

TL;DR
This paper introduces a 3D multi-conditional GAN to generate realistic lung nodules in CT images, significantly improving 3D object detection sensitivity and addressing data scarcity in medical imaging.
Contribution
It presents the first 3D multi-conditional GAN for lung nodule augmentation, enhancing detection performance across various nodule sizes and attenuations.
Findings
Improved detection sensitivity across nodule sizes and attenuations.
Generated nodules are indistinguishable from real ones in Visual Turing Tests.
Addresses data scarcity in medical imaging with realistic synthetic nodules.
Abstract
Accurate Computer-Assisted Diagnosis, relying on large-scale annotated pathological images, can alleviate the risk of overlooking the diagnosis. Unfortunately, in medical imaging, most available datasets are small/fragmented. To tackle this, as a Data Augmentation (DA) method, 3D conditional Generative Adversarial Networks (GANs) can synthesize desired realistic/diverse 3D images as additional training data. However, no 3D conditional GAN-based DA approach exists for general bounding box-based 3D object detection, while it can locate disease areas with physicians' minimum annotation cost, unlike rigorous 3D segmentation. Moreover, since lesions vary in position/size/attenuation, further GAN-based DA performance requires multiple conditions. Therefore, we propose 3D Multi-Conditional GAN (MCGAN) to generate realistic/diverse 32 X 32 X 32 nodules placed naturally on lung Computed…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 5
Figure 55
Figure 6| CPM by Size | CPM by Attenuation | ||||||
|---|---|---|---|---|---|---|---|
| CPM | Small | Medium | Large | Solid | Part-solid | GGN | |
| 632 real images | 0.518 | 0.447 | 0.618 | 0.624 | 0.655 | 0.464 | 0.242 |
| + 3D MCGAN-based DA | 0.550 | 0.452 | 0.683 | 0.662 | 0.699 | 0.521 | 0.244 |
| + 3D MCGAN-based DA | 0.527 | 0.447 | 0.674 | 0.429 | 0.655 | 0.407 | 0.289 |
| + 3D MCGAN-based DA | 0.512 | 0.411 | 0.644 | 0.662 | 0.616 | 0.579 | 0.277 |
| + 3D MCGAN-based DA w/ | 0.508 | 0.430 | 0.633 | 0.556 | 0.626 | 0.471 | 0.271 |
| + 3D MCGAN-based DA w/ | 0.509 | 0.406 | 0.644 | 0.654 | 0.649 | 0.436 | 0.233 |
| + 3D MCGAN-based DA w/ | 0.479 | 0.389 | 0.594 | 0.617 | 0.596 | 0.507 | 0.226 |
| Accuracy | Real Selected as Real | Real as Synt | Synt as Real | Synt as Synt | ||
|---|---|---|---|---|---|---|
| Test1 | Physician1 | 43% | 19 | 31 | 26 | 24 |
| Physician2 | 43% | 13 | 37 | 20 | 30 | |
| Test2 | Physician1 | 57% | 22 | 28 | 15 | 35 |
| Physician2 | 53% | 11 | 39 | 8 | 42 | |
| Test3 | Physician1 | 62% | 25 | 25 | 13 | 37 |
| Physician2 | 79% | 32 | 18 | 3 | 47 | |
| Test4 | Physician1 | 58% | 21 | 29 | 13 | 37 |
| Physician2 | 66% | 36 | 14 | 20 | 30 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729
Synthesizing Diverse Lung Nodules Wherever Massively:
3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection
Changhee Han1,3 Yoshiro Kitamura2 Akira Kudo2 Akimichi Ichinose2 Leonardo Rundo3
Yujiro Furukawa4 Kazuki Umemoto5 Yuanzhong Li2 Hideki Nakayama1
1The University of Tokyo, Tokyo, Japan 2Fujifilm Corporation, Tokyo, Japan
3University of Cambridge, Cambridge, UK 4Jikei University School of Medicine, Tokyo, Japan
5Juntendo University School of Medicine, Tokyo, Japan
Abstract
Accurate Computer-Assisted Diagnosis, relying on large-scale annotated pathological images, can alleviate the risk of overlooking the diagnosis. Unfortunately, in medical imaging, most available datasets are small/fragmented. To tackle this, as a Data Augmentation (DA) method, 3D conditional Generative Adversarial Networks (GANs) can synthesize desired realistic/diverse 3D images as additional training data. However, no 3D conditional GAN-based DA approach exists for general bounding box-based 3D object detection, while it can locate disease areas with physicians’ minimum annotation cost, unlike rigorous 3D segmentation. Moreover, since lesions vary in position/size/attenuation, further GAN-based DA performance requires multiple conditions. Therefore, we propose 3D Multi-Conditional GAN (MCGAN) to generate realistic/diverse nodules placed naturally on lung Computed Tomography images to boost sensitivity in 3D object detection. Our MCGAN adopts two discriminators for conditioning: the context discriminator learns to classify real vs synthetic nodule/surrounding pairs with noise box-centered surroundings; the nodule discriminator attempts to classify real vs synthetic nodules with size/attenuation conditions. The results show that 3D Convolutional Neural Network-based detection can achieve higher sensitivity under any nodule size/attenuation at fixed False Positive rates and overcome the medical data paucity with the MCGAN-generated realistic nodules—even expert physicians fail to distinguish them from the real ones in Visual Turing Test.
1 Introduction
Accurate Computer-Assisted Diagnosis (CAD), thanks to recent Convolutional Neural Networks (CNNs), can alleviate the risk of overlooking the diagnosis in a clinical environment. Such great success of CNNs, including diabetic eye disease diagnosis [12], primarily derives from large-scale annotated training data to sufficiently cover the real data distribution. However, obtaining and annotating such diverse pathological images are laborious tasks; thus, the massive generation of proper synthetic training images matters for reliable diagnosis. Researchers usually use classical Data Augmentation (DA) techniques, such as geometric/intensity transformations [29, 23]. However, those one-to-one translated images have intrinsically similar appearance and cannot sufficiently cover the real image distribution, causing limited performance improvement; in this regard, thanks to its good generalization ability, Generative Adversarial Networks (GANs) [10] can generate realistic but completely new samples using many-to-many mappings for further performance improvement; GANs showed excellent DA performance in computer vision, including performance improvement in eye-gaze estimation [33].
This GAN-based DA trend especially applies to medical imaging, where the biggest problem lies in small and fragmented datasets from various scanners. For performance boost in various 2D medical imaging tasks, some researchers used noise-to-image GANs (e.g., random noise samples to diverse pathological images) for classification [8, 16, 15]; others used image-to-image GANs (e.g., a benign image with a pathology-conditioning image to a malignant one) for object detection [14] and segmentation [3]. However, although 3D imaging is spreading in radiology (e.g., Computed Tomography (CT) and Magnetic Resonance Imaging), such 3D medical GAN-based DA approaches are limited, and mostly focus on segmentation [32, 18]—3D medical image generation is more challenging than 2D one due to expensive computational cost and strong anatomical consistency. Accordingly, no 3D conditional GAN-based DA approach exists for general bounding box-based 3D object detection, while it can locate disease areas with physicians’ minimum annotation cost, unlike rigorous 3D segmentation. Moreover, since lesions vary in position/size/attenuation, further GAN-based DA performance requires multiple conditions.
So, how can GAN generate realistic/diverse 3D nodules placed naturally on lung CT with multiple conditions to boost sensitivity in any 3D object detector? For accurate 3D CNN-based nodule detection (Fig. 1), we propose 3D Multi-Conditional GAN (MCGAN) to generate nodules—such nodule detection is clinically valuable for the early diagnosis/treatment of lung cancer, the deadliest cancer [34]. Since nodules vary in position/size/attenuation, to improve CNN’s robustness, we adopt two discriminators with different loss functions for conditioning: the context discriminator learns to classify real vs synthetic nodule/surrounding pairs with noise box-centered surroundings; the nodule discriminator attempts to classify real vs synthetic nodules with size/attenuation conditions. We also evaluate the synthetic images’ realism via Visual Turing Test [30] by two expert physicians, and visualize the data distribution via t-Distributed Stochastic Neighbor Embedding (t-SNE) [35]. The 3D MCGAN-generated additional training images can achieve higher sensitivity under any nodule size/attenuation at fixed False Positive (FP) rates. Lastly, this study suggests training GANs without loss and using proper augmentation ratio (i.e., ) for better medical GAN-based DA performance.
Research Questions. We mainly address two questions:
- •
3D Multiple GAN Conditioning: How can we condition 3D GANs to naturally place objects of random shape, unlike rigorous segmentation, at desired position/size/attenuation based on bounding box masks?
- •
Synthetic Images for DA: How can we set the number of real/synthetic training data and GAN loss functions to achieve the best detection performance?
Contributions. Our main contributions are as follows:
- •
3D Multi-conditional Image Generation: This first multi-conditional pathological image generation approach shows that 3D MCGAN can generate realistic and diverse nodules placed naturally on lung CT at desired position/size/attenuation, which even expert physicians cannot distinguish from real ones.
- •
Misdiagnosis Prevention: This first GAN-based DA method available for any 3D object detector allows to boost sensitivity at fixed FP rates in CAD with limited medical images/annotation.
- •
Medical GAN-based DA: This study implies that training GANs without loss and using proper augmentation ratio (i.e., ) may boost CNN-based detection performance with higher sensitivity and less FPs in medical imaging.
2 Generative Adversarial Networks
GANs [10] have revolutionized image generation [20] via a two-player minimax game. However, difficult GAN training arises due to its two-player objective function, accompanying artifacts and mode collapse [11] when generating high-resolution images [27]–especially in 3D or conditional image generation; to tackle this, Wu et al. proposed 3D GAN [37] to generate realistic/diverse 3D objects via a mapping from a low-dimensional probabilistic space; Isola et al. proposed Pix2Pix GAN [17] to produce robust images using paired training samples; Park et al. proposed multi-conditional GAN [26] to generate images from a base image and texts describing desired position. In this way, GANs can usually synthesize more realistic/diverse images than other common deep generative models, including variational autoencoders [19] suffering from the injected noise and imperfect reconstruction because of a single objective function [22]. Accordingly, as a DA method, most computer vision researchers chose GANs for improving classification [1], object detection [25], and segmentation [39] to overcome the training data paucity.
Also in medical imaging, to facilitate object detection and segmentation, researchers usually used conditional GANs to generate medical images at desired positions for DA. Han et al. generated brain MR images with tumors at desired positions/sizes for tumor detection [14]. As 3D GANs for DA, Jin et al. [18] generated CT images of both nodules and surrounding tissues—unlike we only generate nodules located smoothly on surroundings—for 2D nodule segmentation. Gao et al. [9] generated 3D subvolumes of nodules for subvolume-based 3D nodule detection via binary classification; but, the subvolume-based detection accompanies numerous FPs, and unlike our work, most other 3D object detectors cannot use the generated nodules as additional training data since they do not condition nodule positions.
To the best of our knowledge, our work is the first 3D medical GAN-based DA approach using automatic bounding box annotation while 3D bounding boxes require much cheaper annotation cost than rigorous 3D segmentation. Moreover, we, for the first time, generate 3D multi-conditional images using GANs. In terms of annotation cost, generating realistic and diverse lung nodules at desired position/size/attenuation using 3D MCGANs—may become a clinical breakthrough .
3 Methods
3.1 3D MCGAN-based Image Generation
Data Preparation This study exploits the Lung Image Database Consortium image collection (LIDC) dataset [2] containing chest CT scans with lung nodules. Since the American College of Radiology recommends lung nodule evaluation using thin-slice CT scans [31], we only use scans with the slice thickness mm and mm in-plane pixel spacing mm. Then, we interpolate the slice thickness to mm and exclude scans with slice number .
To explicitly provide MCGAN with meaningful nodule appearance information and thus boost DA performance, the authors further annotate those nodules by size and attenuation for GAN training with multiple conditions: small (slice thickness mm); medium ( mm slice thickness mm); large (slice thickness mm); solid; part-solid; Ground-Glass Nodule (GGN). Afterwards, the remaining dataset ( scans) is divided into: (i) a training set ( scans/ nodules); (ii) a validation set ( scans/ nodules); (iii) a test set ( scans/ nodules); only the training set is used for MCGAN training to be methodologically sound. The training set contains more average nodules since we exclude patients with too many nodules for the validation/test sets; we arrange a clinical environment-like situation, where we could find more healthy patients than highly diseased ones to conduct anomaly detection.
3D MCGAN is a novel GAN training method for DA, generating realistic but new nodules at desired position/size/attenuation, naturally blending with surrounding tissues (Fig. 2). We crop/resize various nodules to voxels and replace them with noise boxes from a uniform distribution between , while maintaining their surroundings as Volumes of Interest (VOIs)—using those noise boxes, instead of boxes filled with the same voxel values, improves the training robustness; then, we concatenate the VOIs with size/attenuation conditions tiled to voxels (e.g., if the size is small, each voxel of the small condition is filled with , while the medium/large condition voxels are filled with [math] to consider the effect of scaling factor). So, our generator uses the inputs to generate desired nodules in the noise box regions. The 3D U-Net [5]-like generator adopts convolutional layers in encoders and deconvolutional layers in decoders respectively with skip connections to effectively capture both nodule/context information.
We adopt two Pix2Pix GAN [17]-like discriminators with different loss functions: the context discriminator learns to classify real vs synthetic nodule/surrounding pairs with noise box-centered surroundings using Least Squares loss (LSGAN) [21]; the nodule discriminator attempts to classify real vs synthetic nodules with size/attenuation conditions using Wasserstein loss with Gradient Penalty (WGAN-GP) [11]. The LSGAN in the context discriminator forces the model to learn surrounding tissue background by reacting more sensitively to every pixel in images than regular GANs. The WGAN-GP in the nodule discriminator allows the model to generate realistic/diverse nodules without focusing too much on details. Empirically, we confirm that such multiple discriminators with the mutually complementary loss functions, along with size/attenuation conditioning, help generate realistic/diverse nodules naturally placed at desired positions on CT scans; similar results are also reported by this work [25] for 2D pedestrian detection without label conditioning. We apply dropout to inject randomness and balance the generator/discriminators. Batch normalization is applied to both convolution (using LeakyReLU) and deconvolution (using ReLU).
Most GAN-based DA approaches use reconstruction loss [9] to generate realistic images, even modifying it for further realism [18]. However, no one has ever validated whether it really helps DA—it assures synthetic images resembling the original ones, sacrificing diversity; thus, to confirm its influence during classifier training, we compare our MCGAN objective without/with it, respectively:
[TABLE]
We set 100 as a weight for the loss, since empirically it works well for reducing visual artifacts introduced by the GAN loss and most GAN works adopt the weight [17, 25].
3D MCGAN Implementation Details Training lasts for steps with a batch size of and learning rate for the Adam optimizer. We use horizontal/vertical flipping as DA and flip real/synthetic labels once in three times for robustness. During testing, we augment nodules with the same size/attenuation conditions by applying a random combination to real nodules of width/height/depth shift up to and zooming up to for better DA. As post-processing, we blend bounding boxes’ nearest surfaces from all the boundaries by averaging the values of nearest voxels/itself for iterations. We resample the resulting nodules to their original resolution and map back onto the original CT scans to prepare additional training data.
3.2 Lung Nodule Detection Using 3D Faster RCNN
3D Faster RCNN is a 3D version of Faster RCNN [28] using multi-task loss with a -layer Region Proposal Network of 3D convolutional layers, batch normalization layers, and ReLU layers. To confirm the effect of MCGAN-based DA, we compare the following detection results trained on (i) real images without GAN-based DA, (ii), (iii), (iv) with // MCGAN-based DA (i.e., // additional synthetic training images) , (v), (vi), (vii) with // MCGAN-based DA trained with loss. During training, we shuffle the real/synthetic image order. We evaluate the detection performance as follows: (i) Free Receiver Operation Characteristic (FROC) analysis, sensitivity as a function of FPs per scan; (ii) Competition Performance Metric (CPM) score [24], average sensitivity at seven pre-defined FP rates: 1/8, 1/4, 1/2, 1, 2, 4, and 8 FPs per scan—this quantifies if a CAD system can identify a significant percentage of nodules with both very few FPs and moderate FPs.
3D Faster RCNN Implementation Details During training, we use a batch size of and learning rate ( after steps) for the SGD optimizer with momentum. The input volume size to the network is set to voxels. As classical DA, a random combination of width/height/depth shift up to and zooming up to are also applied to both real/synthetic images to achieve the best performance. For testing, we pick the model with the highest sensitivity on validation between - steps under Intersection over Union (IoU) threshold /detection threshold to avoid severe FPs.
3.3 Clinical Validation Using Visual Turing Test
To quantitatively evaluate the realism of MCGAN-generated images, we supply, in a random order, to two expert physicians a random selection of real and synthetic lung nodule images with all of 2D axial/coronal/sagittal views at the center. They take four classification tests in ascending order: Test1, 2: real vs MCGAN-generated nodules, trained without/with loss; Test3, 4: real vs MCGAN-generated nodules with surroundings without/with loss. Such Visual Turing Test [30] can evaluate the visual quality of GAN-generated medical images in a clinical environment, where physicians’ specialty is critical [13, 4].
3.4 Visualization Using t-SNE
To visually analyze the distribution of real/synthetic images, we use t-SNE [35] on a random selection of real, synthetic, and loss-added synthetic nodule images, with a perplexity of for iterations to get a 2D representation. We normalize the input images to . The t-SNE method represents high-dimensional data into a lower-dimensional space by reducing the dimensionality; it uses perplexity to non-linearly balance between the input data’s local and global aspects.
4 Results
4.1 Lung Nodules Generated by 3D MCGAN
We generate realistic nodules in noise box regions at various position/size/attenuation, naturally blending with surrounding tissues including vessels, soft tissues, and thoracic walls (Fig. 3). Especially, when trained without loss, those synthetic nodules look much more different from the original real ones, including slight shading difference.
4.2 Lung Nodule Detection Results
Table 1 and Fig. 4 show that it is easier to detect nodules with larger size and lower attenuation due to their clear appearance. 3D MCGAN-based DA with less augmentation ratio consistently increases sensitivity at fixed FP rates—especially, training with MCGAN-based DA without loss outperforms training only with real images under any size/attenuation in terms of CPM, achieving average CPM improvement by 0.032. It especially boosts nodule detection performance with larger size and lower attenuation. Fig. 5 visually reveals its ability to alleviate the risk of overlooking the nodule diagnosis with clinically acceptable FPs (i.e., the highly-overlapping bounding boxes around nodules only require a physician’s single check by switching on/off transparent alpha-blended annotation on CT scans). Surprisingly, adding more synthetic images tends to decrease sensitivity, probably due to the real/synthetic training image balance. Moreover, further nodule realism introduced by loss rather decreases sensitivity as loss sacrifices diversity in return for the realism.
4.3 Visual Turing Test Results
As Table 2 shows, expert physicians fail to classify real vs MCGAN-generated nodules without surrounding tissues—even regarding the synthetic nodules trained without loss more realistic than the real ones. Contrarily, they relatively recognize the synthetic nodules with surroundings due to slight shading difference between the nodules/surroundings, especially when trained without the reconstruction loss. Considering the synthetic images’ realism, CPGGANs might perform as a tool to train medical students and radiology trainees when enough medical images are unavailable, such as abnormalities at rare position/size/attenuation. Such GAN applications are clinically promising [7].
4.4 T-SNE Results
Implying their effective DA performance, synthetic nodules have a similar distribution to real ones, but concentrated in left inner areas with less real ones especially when trained without loss (Fig. 6)–using only GAN loss during training can avoid overwhelming influence from the real image samples, resulting in a moderately similar distribution; thus, those synthetic images can partially fill the real image distribution uncovered by the original dataset.
5 Conclusion
Our bounding box-based 3D MCGAN can generate diverse CT-realistic nodules at desired position/size/attenuation, naturally blending with surrounding tissues—those synthetic training data boost sensitivity under any size/attenuation at fixed FP rates in 3D CNN-based nodule detection. This attributes to the MCGAN’s good generalization ability coming from multiple discriminators with mutually complementary loss functions, along with informative size/attenuation conditioning; they allow to cover real image distribution unfilled by the original dataset, improving the training robustness.
Surprisingly, we find that adding over-sufficient synthetic images produces worse results due to the real/synthetic image balance; as t-SNE results show, the synthetic images only partially cover the real image distribution, and thus GAN-overwhelming training images rather harm training. Moreover, we notice that GAN training without loss obtains better DA performance thanks to increased diversity providing robustness; also expert physicians confirm their sufficient realism without loss.
Overall, our 3D MCGAN could help minimize expert physicians’ time-consuming annotation tasks and overcome the general medical data paucity, not limited to lung CT nodules. As future work, we will investigate the MCGAN-based DA results without size/attenuation conditioning to confirm their influence on DA performance. Moreover, we will compare our DA results with other non-GAN-based recent DA approaches, such as mixup [38] and cutout [6]. For further performance boost, we plan to directly optimize the detection results for MCGANs, instead of realism, similarly to the three-player GAN for classification [36]. Lastly, we will investigate how our MCGAN can perform as a physician training tool to display random realistic medical images with desired abnormalities (i.e., position/size/attenuation conditions) to help train medical students and radiology trainees despite infrastructural and legal constraints [7].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Antoniou, A. Storkey, and H. Edwards. Data augmentation generative adversarial networks. ar Xiv preprint ar Xiv:1711.04340 , 2017.
- 2[2] S. G. Armato III, G. Mc Lennan, L. Bidaut, M. F. Mc Nitt-Gray, C. R. Meyer, et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics , 38(2):915–931, 2011.
- 3[3] O. Bailo, D. Ham, and Y. M. Shin. Red blood cell image generation for data augmentation using conditional generative adversarial networks. ar Xiv preprint ar Xiv:1901.06219 , 2019.
- 4[4] M. J. Chuquicusma, S. Hussein, J. Burt, and U. Bagci. How to fool radiologists with generative adversarial networks? a visual Turing test for lung cancer diagnosis. In Proc. IEEE International Symposium on Biomedical Imaging (ISBI 2018) , pages 240–244, 2018.
- 5[5] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) , pages 424–432. Springer, 2016.
- 6[6] T. De Vries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. ar Xiv preprint ar Xiv:1708.04552 , 2017.
- 7[7] S. G. Finlayson, H. Lee, I. S. Kohane, and L. Oakden-Rayner. Towards generative adversarial networks as a new paradigm for radiology education. In Proc. Machine Learning for Health (ML 4H) Workshop ar Xiv:1812.01547 , 2018.
- 8[8] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing , 321:321–331, 2018.
