G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training
Che Liu, Cheng Ouyang, Sibo Cheng, Anand Shah, Wenjia Bai, Rossella, Arcucci

TL;DR
G2D introduces a novel vision-language pre-training framework that enhances fine-grained, dense visual feature learning in medical images, significantly improving performance on segmentation and localization tasks with minimal training data.
Contribution
The paper proposes G2D, a new VLP method that learns dense, semantically-grounded image representations via pseudo segmentation, without extra parameters, improving fine-grained medical image understanding.
Findings
G2D outperforms existing models on 6 medical imaging tasks.
G2D achieves superior segmentation performance with only 1% training data.
G2D enhances fine-grained feature learning for dense prediction tasks.
Abstract
Recently, medical vision-language pre-training (VLP) has reached substantial progress to learn global visual representation from medical images and their paired radiology reports. However, medical imaging tasks in real world usually require finer granularity in visual features. These tasks include visual localization tasks (e.g., semantic segmentation, object detection) and visual grounding task. Yet, current medical VLP methods face challenges in learning these fine-grained features, as they primarily focus on brute-force alignment between image patches and individual text tokens for local visual feature learning, which is suboptimal for downstream dense prediction tasks. In this work, we propose a new VLP framework, named \textbf{G}lobal to \textbf{D}ense level representation learning (G2D) that achieves significantly improved granularity and more accurate grounding for the learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Radiomics and Machine Learning in Medical Imaging
MethodsFocus
