Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies
Leo K. Tam, Xiaosong Wang, Evrim Turkbey, Kevin Lu, Yuhong Wen, and, Daguang Xu

TL;DR
This paper introduces LITERATI, a weakly supervised vision-language detection architecture for medical images, leveraging large-scale pneumonia and pneumothorax datasets with natural language annotations to improve detection without detailed labels.
Contribution
The paper presents a novel weakly supervised detection method combining vision and language, with a new dataset and architecture tailored for medical image analysis.
Findings
LITERATI outperforms baseline methods like CAM and gradient CAM.
The approach effectively localizes objects using natural language in a weakly supervised setting.
The dataset provides rich annotations for pneumonia and pneumothorax detection.
Abstract
Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset especially focussed on pneumonia and pneumothorax. Along with the dataset, we present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI) alongside strong baseline comparisons with class activation mapping (CAM), gradient CAM, and relevant implementations on the NIH ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language architectures, the LITERATI method demonstrates joint image and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Topic Modeling
MethodsClass-activation map
