Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese
Thao T.B. Nguyen, Tam M. Vo, Thang V. Nguyen, Hieu H. Pham, Ha Q., Nguyen

TL;DR
This study develops a Vietnamese-specific chest X-ray dataset with accurate labels derived from radiology reports, and trains deep learning models to diagnose thorax diseases, achieving high labeling accuracy and promising diagnostic performance.
Contribution
The paper introduces a novel Vietnamese radiology report-based labeling pipeline and a tailored CXR dataset, enhancing disease diagnosis accuracy for Vietnamese clinical settings.
Findings
Labeling pipeline achieves F1-score of at least 0.9923 across classes.
Best model (EfficientNet-B2) attains an F1-score of 0.6989 for abnormal diagnosis.
Coarse classification performs comparably to fine classification on CheXpert dataset.
Abstract
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we built a CXR dataset containing 9,752 studies and evaluated our pipeline using a subset of this dataset. With an F1-score of at least 0.9923, the evaluation demonstrates that our labeling tool performs precisely and consistently across all classes. After building the dataset, we train deep learning models that leverage knowledge transferred from large public CXR datasets. We employ a variety of loss functions to overcome the curse of imbalanced multi-label datasets and conduct experiments with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Tuberculosis Research and Epidemiology · Data-Driven Disease Surveillance
