G2D: From Global to Dense Radiography Representation Learning via   Vision-Language Pre-training

Che Liu; Cheng Ouyang; Sibo Cheng; Anand Shah; Wenjia Bai; Rossella; Arcucci

arXiv:2312.01522·cs.CV·October 28, 2024·1 cites

G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training

Che Liu, Cheng Ouyang, Sibo Cheng, Anand Shah, Wenjia Bai, Rossella, Arcucci

PDF

Open Access 1 Repo

TL;DR

G2D introduces a novel vision-language pre-training framework that enhances fine-grained, dense visual feature learning in medical images, significantly improving performance on segmentation and localization tasks with minimal training data.

Contribution

The paper proposes G2D, a new VLP method that learns dense, semantically-grounded image representations via pseudo segmentation, without extra parameters, improving fine-grained medical image understanding.

Findings

01

G2D outperforms existing models on 6 medical imaging tasks.

02

G2D achieves superior segmentation performance with only 1% training data.

03

G2D enhances fine-grained feature learning for dense prediction tasks.

Abstract

Recently, medical vision-language pre-training (VLP) has reached substantial progress to learn global visual representation from medical images and their paired radiology reports. However, medical imaging tasks in real world usually require finer granularity in visual features. These tasks include visual localization tasks (e.g., semantic segmentation, object detection) and visual grounding task. Yet, current medical VLP methods face challenges in learning these fine-grained features, as they primarily focus on brute-force alignment between image patches and individual text tokens for local visual feature learning, which is suboptimal for downstream dense prediction tasks. In this work, we propose a new VLP framework, named \textbf{G}lobal to \textbf{D}ense level representation learning (G2D) that achieves significantly improved granularity and more accurate grounding for the learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cheliu-computation/g2d-neurips24
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Radiomics and Machine Learning in Medical Imaging

MethodsFocus