Deep Feature Learning from a Hospital-Scale Chest X-ray Dataset with   Application to TB Detection on a Small-Scale Dataset

Ophir Gozes; Hayit Greenspan

arXiv:1906.00768·eess.IV·June 4, 2019

Deep Feature Learning from a Hospital-Scale Chest X-ray Dataset with Application to TB Detection on a Small-Scale Dataset

Ophir Gozes, Hayit Greenspan

PDF

TL;DR

This paper demonstrates that training a DenseNet-121 on a large-scale Chest X-ray dataset with metadata improves feature learning, leading to better TB detection on small datasets and state-of-the-art age and gender estimation.

Contribution

It introduces MetaChexNet, a CNN trained on 112K images with metadata, enhancing transfer learning for TB detection and other tasks in medical imaging.

Findings

01

Improved TB classification accuracy on small datasets.

02

State-of-the-art age and gender estimation performance.

03

Enhanced transfer learning capabilities for medical imaging tasks.

Abstract

The use of ImageNet pre-trained networks is becoming widespread in the medical imaging community. It enables training on small datasets, commonly available in medical imaging tasks. The recent emergence of a large Chest X-ray dataset opened the possibility for learning features that are specific to the X-ray analysis task. In this work, we demonstrate that the features learned allow for better classification results for the problem of Tuberculosis detection and enable generalization to an unseen dataset. To accomplish the task of feature learning, we train a DenseNet-121 CNN on 112K images from the ChestXray14 dataset which includes labels of 14 common thoracic pathologies. In addition to the pathology labels, we incorporate metadata which is available in the dataset: Patient Positioning, Gender and Patient Age. We term this architecture MetaChexNet. As a by-product of the feature…

Tables5

Table 1. TABLE I: DenseNet121 based architectures for ChestXray14.

Layer Name	Size	Connected to	MetaChexNet	ChexNet [1]
Features	1024	Last Dense Block output	7 $\times$ 7 global average pool
14 Pathologies Classification	14	Features	14D fully-connected, sigmoid
Position & Gender Classification	2	Features	2D fully-connected, Sigmoid	-
Intermediate (internal)	10	Features	10D fully-connected, ReLU	-
Age Regression	1	Intermediate	1D fully-connected, Sigmoid	-

Table 2. TABLE II: Phase I Results on ChestXray14 (Validation,Test)

	MetaChexnet	ChexNet [1]
AUC 14 pathologies	0.83,0.80	0.83,0.80
AUC Gender	0.997,0.996	-
AUC Position(PA/AP)	0.998,0.998	-
AGE Error	$- 0.05 \pm 5.6 Y$ , $- 0.16 \pm 5.68 Y$	-
AGE Absolute Error	4.28Y,4.24Y	-

Table 3. TABLE III: Logistic regression coefficients for TB detection

Pathology	coefficient	Pathology	coefficient
Atelectasis	-0.76	Pneumothorax	-0.01,
Cardiomegaly	-0.49	Consolidation	-0.011,
Effusion	0.86	Edema	0.21
Infiltration	1.12	Emphysema	0.40
Mass	0.056	Fibrosis	1.24
Nodule	0.83	Pleural Thickening	-0.67
Pneumonia	-0.72	Hernia	0.00

Table 4. TABLE IV: TB Datasets

	TB Negative	TB Positive
Shenzen-Training	226	236
Shenzen-Validation	50	50
Shenzen-Test	50	50
Montgomery-External Test Set	80	58

Table 5. TABLE V: AUC results for TB classification

	Size	DenseNet121	ChexNet	MetaChexnet
		ImageNet pre-trained		(Proposed)
Shenzen	100	0.933	0.928	0.965
Montgomery	138	0.846	0.952	0.928
Combined	238	0.803	0.944	0.937

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Deep Feature Learning from a Hospital-Scale Chest X-ray Dataset with Application to TB Detection on a Small-Scale Dataset

Ophir Gozes1 and Hayit Greenspan2 1Ophir Gozes is with Faculty of Electrical Engineering, Tel Aviv University, Israel [email protected]2Hayit Greenspan is with the Department of Biomedical Engineering, Tel Aviv University [email protected]

Abstract

The use of ImageNet pre-trained networks is becoming widespread in the medical imaging community. It enables training on small datasets, commonly available in medical imaging tasks. The recent emergence of a large Chest X-ray dataset opened the possibility for learning features that are specific to the X-ray analysis task. In this work, we demonstrate that the features learned allow for better classification results for the problem of Tuberculosis detection and enable generalization to an unseen dataset.

To accomplish the task of feature learning, we train a DenseNet-121 CNN on 112K images from the ChestXray14 dataset which includes labels of 14 common thoracic pathologies. In addition to the pathology labels, we incorporate metadata which is available in the dataset: Patient Positioning, Gender and Patient Age. We term this architecture MetaChexNet. As a by-product of the feature learning, we demonstrate state of the art performance on the task of patient Age & Gender estimation using CNN’s. Finally, we show the features learned using ChestXray14 allow for better transfer learning on small-scale datasets for Tuberculosis.

I Introduction

The recent emergence of large x-ray datasets has opened the way for the development of Computer-Aided Detection (CAD) tools for a set of the most common chest pathologies (ChexNet [1], Wang [2]). For the case of other pathologies, such as for Tuberculosis (TB), small datasets still remain a challenge. According to the World Health Organization Global Tuberculosis report 2018 [3], TB is one of the top 10 causes of death worldwide. In 2017, TB caused an estimated 1.6 million deaths worldwide. If detected in early stages TB can be treated, thus there is a need for the development of CAD tools for automatic screening of TB [3].

Previous work by Lakhani et al. [4] demonstrated the advantage of using ImageNet [5] pre-trained architectures for TB detection on small-scale datasets. The strategy of using the ImagneNet pre-trained network is effective, since lower level natural-image features can be relevant to medical images. This was further verified by Sivaramakrishnan et al. [6]. In the current work, we show that although clearly important, this transfer is sub-optimal. In addition to ImageNet based features, we propose to further fine-tune an ImageNet pre-trained network with the hospital-scale ChestXray14 dataset. This further adapts the features learned to work on medical chest X-ray images.

ImageNet pre-trained networks are trained for classification of 1.2 Million natural color images into 1000 classes as part of the ILSVRC challenge [7]. To train our feature extraction network, which we term MetaChexNet, we start with an ImageNet pre-trained architecture and further train it on the ChestXray14 dataset that contains 112K images with the associated labels of 14 common thoracic pathology labels; In addition to the 14 given labels, we include the metadata which is comprised of patient age, position, and gender.

The contribution of this work includes the following:

•

We present a feature learning scheme which uses pathology labels and metadata of a hospital-scale chest X-ray dataset.

•

We demonstrate a method for gender, age and position estimation on Chest X-ray using deep learning.

•

We demonstrate the applicability of features learned from a hospital-scale dataset to tackle small-scale chest X-ray dataset for the case of Tuberculosis. We compare our results to ImageNet pre-trained architecture.

The proposed method is presented in Section 2. Experimental results are shown in Section 3. In Section 4, a discussion of the results is presented.

II Methods

Our method is comprised of two phases, as shown in Figure 1. In Phase I, we learn Chest X-ray specific features. In Phase II we fine tune the network trained in Phase I (MetaChexNet) for TB detection.

II-A Phase I: Chest X-ray Feature learning

In order to learn discriminative CXR specific features, we start with an ImageNet pre-trained DenseNet121 [8] architecture and train it on the NIH ChestXray14 dataset. We train the network for the multi-task problem of predicting pathology labels and the metadata associated with each image, as follows: Task 1: Learning Chest X-ray pathology related features- The network is trained to perform binary multi-label classification of the 14 most common chest pathologies: {Atelectasis; Cardiomegaly; Effusion; Infiltration; Mass; Nodule; Pneumonia; Pneumothorax; Consolidation; Edema; Emphysema, Fibrosis; Pleural-Thickening; Hernia}. Task 2: Learning metadata related features- The network is trained for metadata prediction which includes binary classification of the patient’s position {AP-Anteroposterior, PA-Posterioranterior}, gender {Female, Male} in addition to age regression{0-100}.

**Architecture & Training: ** Rajpurkar et al first used DenseNet121 architecture on the ChestXray14 to perform multi-label pathology classification and termed it ChexNet [1]. In our work we focus on DenseNet121 architecture with compression factor $\theta=0.5$ . As the feature vector, we regard the output of the last average pooling layer of the DenseNet121 which is of size 1024. In comparison to ChexNet, we increase the size of the binary output vector to 16 to accommodate for the two extra binary metadata labels (position, gender). In order to allow for age regression, we add an intermediate dense layer followed by a sigmoid activated neuron. We term the trained network MetaChexNet since it is an extension of the ChexNet architecture providing metadata prediction. The Network architecture is specified in Table I. For the loss function, we use binary cross entropy loss for the binary variables and mean absolute error loss for the continuous age variable. Training is performed with a Nesterov Adam optimizer (Nadam) with batch_size=32 and an initial learning rate of 1E-3. Learning rate is reduced by a factor of 10 each time validation loss stops improving after one epoch. As pre-processing, the images are re-sized to size $224\times 224$ and converted to a three channel RGB image by channel duplication. The images are normalized by subtracting the ImageNet mean and dividing by ImageNet standard deviation. The age metadata was scaled to the range of [0,1]. For data augmentation, we randomly flip the training images horizontally 50% of the time.

II-B Phase II: Fine Tuning for TB detection

We address the task of TB detection using the features learned in Phase I. The dataset available for TB training is two orders of magnitude smaller then the NIH ChestXray dataset, thus it is more difficult to train deep architectures without pre-training. To tackle the smaller size of the dataset, we employ data augmentation which includes horizontal flipping, random scaling $(scalefactor=[0.9,1.1])$ , and random multiplication by a constant $(range=[0.8,1.3])$ . Each augmentation method is performed with a probability of 0.5.

Taking advantage of the features learned during phase I of the training, our TB detection network is constructed on top of MetaChexnet or ChexNet feature layer. We use a single sigmoid activated neuron for the TB class. During the training for TB detection, we finetune the entire network with the same hyper-parameters as used in Phase I. The loss function is binary cross entropy loss. We select the model that received the best AUC value on the validation set and use it for calculating the test metrics on the test set.

III Experiments and results

III-A Chest X-ray Feature Learning

For this experiment, the ChestXray14 dataset which contains 112K Chest X-ray images was split to three sets Train(104,266), Validation(6,336) and Test(1,518) with no patient overlap between the sets. The number of patients per set was 28,744, 1672 and 389, respectively. The 14 pathology labels present in the dataset were text-mined from radiology reports. The information relating to age, gender and position was extracted from the image metadata.

Summary of the results of ChexNet and MetaChexNet is given in Table II. We specify the results of both validation and test set. The Mean AUC for the detection of 14 pathologies is similar in both architectures. The AUC for gender and position was remarkably high ( $>$ 0.99). MetaChexnet was able to predict patient age with an absolute error of 4.3 years.

In Fig.3 we display the Bland-Altman plot corresponding to age prediction on the ChestXray14 test subset. The age distribution among our test set was 42 $\pm$ 18 years. It can be seen that the bias is small.

Feature Visualization: To visualize the features learned by the MetaChexnet, we use t-SNE [9] visualization of the feature layer. In Fig.2 we display the t-SNE corresponding to the feature representation of the test samples. It can be noted that the features are arranged in four clusters corresponding to gender and position. Viewing the Gender t-SNE, it is interesting to note that most of the gender classification mistakes occur in young ages (center of 4 clusters). An example of this type of miss-classification is visually demonstrated in Fig.5-A.

A few example predictions are displayed in Fig.5. Using our algorithm we detected an erroneous age label which corresponds to patient ID 27989. The MetaChexNet age estimation was 40.5Y while the labeled age was 155Y (Fig. 5-C).

*Relationship between ChestXray14 pathologies and TB: * Several of the pathology labels in the NIH ChestXray14 can appear as radiographic manifestations of TB [10]. To study the connection between ChestXray14 pathologies and TB, we used the trained ChexNet of phase I. We predicted the 14 pathologies on Shenzen dataset and performed logistic regression on the log odds of the output probabilities. In Table III we show the Logistic Regression coefficients for each pathology. It can be noted that Fibrosis, Infiltration, Nodule, and Effusion were positively correlated with the presence of TB. The accuracy of the fit on the entire Shenzen dataset was 0.858. This motivates the use of ChestXray14 for discriminative feature learning.

III-B Fine Tuning for TB detection

In order to train for TB detection, the Montgomery and Shenzen datasets containing postero-anterior (PA) chest radiographs were used [11]. The two datasets contain normal and abnormal chest X-rays with manifestations of TB and include radiologist readings. The Montgomery dataset was kept exclusively for testing, allowing examination of the generalization ability of the different networks. The images in Shenzen dataset used for training our algorithm were captured using Digital radiography machines (DR) while the images in Montgomery dataset were acquired using Computed radiography machines (CR). The composition of the datasets is given in Table IV.

Using the ChexNet and MetaChexnet networks which were trained in Phase I, we performed fine-tuning on the Shzenzen TB dataset. As a baseline, we consider a DenseNet121 ImageNet pre-trained network and train it on the Shezen dataset. Montgomery test set was used exclusively to study the generalization ability of the networks and was not included in the training scheme. In Table V we display our results on the Shenzen test subset and on the Montogemery external set. We note that the highest AUC on the Shenzen dataset was attained by MetaChexNet.

*Generalization Ability * Previous work [4] combined the available datasets into a single dataset from which the test subset was extracted. In our experiments, we examine the generalization ability on a dataset containing different population which was acquired using different radiography technology [11]. Examining the generalization ability over Montgomery dataset, we notice the high AUC is maintained in ChexNet and MetaChexnet while a decline is evident in ImageNet pre-trained DenseNet121 network. This demonstrates the robustness of the features learned from ChestXray14. In addition, we examine the AUC results on a combined set comprised of Montgomery set and Shenzen test subset (Table V). On the combined test-set, ChexNet and MetaChexnet maintained high AUC while a decrease was noted in the ImageNet pre-trained DenseNet121.

We attribute that to the different range of output probabilities between the test sets. This lack of uniformity in output probability range inhibits the selection of an optimal threshold that is suitable for the combined set. In Fig.4 the ROC curves of MetaChexNet and the baseline approach are displayed. It can be observed that MetaChexNet demonstrated better performance over the entire range of thresholds.

IV Discussion

We have displayed a scheme for learning image features specific to Chest-Xray by using a hospital-scale dataset with pathology labels and metadata. While pathology labels in the ChestXray14 can be noisy, metadata is an objective and accurate label thus it can be advantageous to the feature learning process. In addition, it contributes information for both normal cases and pathological cases. In our experiments on small-scale datasets, we have demonstrated the advantage of our proposed architecture (MetaChexNet) over ImageNet pre-trained architectures in both detection results and generalization ability on an external dataset. As an exciting byproduct, we have also demonstrated how a CNN, in addition to pathologies, can predict the patient’s metadata using a Chest X-ray as input. One application to that can be the detection of electronic medical record mistakes which can be critical to patient care.

Bibliography11

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Rajpurkar, P. et al. Chex Net: Radiologist-level pneumonia detection on chest x-rays with deep learning. ar Xiv preprint ar Xiv:1711.05225 (2017).
2[2] Wang, X. et al. Chest C-ray 8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).pp. 3462-3471. IEEE. (2017).
3[3] World Health Organization. (‎2018)‎. Global tuberculosis report 2018. World Health Organization. http://www.who.int/iris/handle/10665/274453.
4[4] Lakhani, P., Baskaran S. ”Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks.” Radiology 284.2 (2017): 574-582.
5[5] Deng, Jia, et al. ”Image Net: A large-scale hierarchical image database.” Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. Ieee, 2009.‏
6[6] Sivaramakrishnan, R., et al. ”Comparing deep learning models for population screening using chest radiography.” Medical Imaging 2018: Computer-Aided Diagnosis. Vol. 10575. International Society for Optics and Photonics, 2018.‏
7[7] Russakovsky, O., et al. Image Net Large Scale Visual Recognition Challenge. IJCV, 2015
8[8] Huang, G. et al. Densely connected convolutional networks. ar Xiv preprint ar Xiv:1608.06993, 2016

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Deep Feature Learning from a Hospital-Scale Chest X-ray Dataset with Application to TB Detection on a Small-Scale Dataset

Abstract

I Introduction

II Methods

II-A Phase I: Chest X-ray Feature learning

II-B Phase II: Fine Tuning for TB detection

III Experiments and results

III-A Chest X-ray Feature Learning

III-B *Fine Tuning for TB detection *

IV Discussion

III-B Fine Tuning for TB detection