Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

Kang Liu; Zhuoqi Ma; Siyu Liang; Yunan Li; Xiyue Gao; Chao Liang; Kun Xie; Qiguang Miao

arXiv:2603.26049·cs.CV·March 30, 2026

Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

Kang Liu, Zhuoqi Ma, Siyu Liang, Yunan Li, Xiyue Gao, Chao Liang, Kun Xie, Qiguang Miao

PDF

1 Repo 1 Models

TL;DR

CoGaze introduces a novel pretraining framework for chest X-ray analysis that incorporates radiologists' gaze and clinical context to improve diagnostic reasoning and cross-modal alignment.

Contribution

It presents a context- and gaze-guided vision-language pretraining method that models radiologists' diagnostic workflow and enhances performance across multiple medical imaging tasks.

Findings

01

Outperforms state-of-the-art methods in report generation, classification, and retrieval.

02

Achieves up to +2.0% CheXbertF1 and +23.2% AUROC improvements.

03

Effectively leverages gaze and clinical context for better model understanding.

Abstract

Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretraining framework for chest X-rays. We first propose a context-infused vision encoder that models how radiologists integrate clinical context -- including patient history, symptoms, and diagnostic intent -- to guide diagnostic reasoning. We then present a multi-level supervision paradigm that (1) enforces intra- and inter-modal semantic alignment through hybrid-positive contrastive learning, (2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mk-runner/CoGaze
github

Models

🤗
MK-runner/CoGaze
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.