Large-scale and Fine-grained Vision-language Pre-training for Enhanced   CT Image Understanding

Zhongyi Shui; Jianpeng Zhang; Weiwei Cao; Sinuo Wang; Ruizhe Guo; Le; Lu; Lin Yang; Xianghua Ye; Tingbo Liang; Qi Zhang; Ling Zhang

arXiv:2501.14548·cs.CV·January 27, 2025·2 cites

Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding

Zhongyi Shui, Jianpeng Zhang, Weiwei Cao, Sinuo Wang, Ruizhe Guo, Le, Lu, Lin Yang, Xianghua Ye, Tingbo Liang, Qi Zhang, Ling Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fine-grained vision-language pre-training model for CT images that aligns anatomical regions with report descriptions, improving zero-shot diagnosis performance across multiple tasks.

Contribution

It proposes a novel anatomy-level contrastive learning approach with false-negative calibration, and curates the largest CT dataset for comprehensive evaluation.

Findings

01

Achieved 81.3% average AUC in zero-shot diagnosis tasks.

02

Outperformed CLIP and supervised methods by significant margins.

03

Demonstrated versatility across 54 diagnosis tasks and 15 anatomies.

Abstract

Artificial intelligence (AI) shows great potential in assisting radiologists to improve the efficiency and accuracy of medical image interpretation and diagnosis. However, a versatile AI model requires large-scale data and comprehensive annotations, which are often impractical in medical settings. Recent studies leverage radiology reports as a naturally high-quality supervision for medical images, using contrastive language-image pre-training (CLIP) to develop language-informed models for radiological image interpretation. Nonetheless, these approaches typically contrast entire images with reports, neglecting the local associations between imaging regions and report sentences, which may undermine model performance and interoperability. In this paper, we propose a fine-grained vision-language model (fVLM) for anatomy-level CT image interpretation. Specifically, we explicitly match…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba-damo-academy/fvlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging

MethodsContrastive Language-Image Pre-training · Contrastive Learning