Thinking Like a Radiologist: A Dataset for Anatomy-Guided Interleaved Vision Language Reasoning in Chest X-ray Interpretation
Yichen Zhao, Zelin Peng, Piao Yang, Xiaokang Yang, and Wei Shen

TL;DR
This paper introduces MMRad-IVL-22K, a large-scale dataset for interleaved visual and language reasoning in chest X-ray interpretation, improving medical AI report accuracy and reasoning consistency.
Contribution
It presents the first dataset designed for natively interleaved visual language reasoning in radiology, enabling more accurate and reliable medical AI diagnostics.
Findings
Multimodal CoT improves clinical report accuracy by 6%.
Models fine-tuned on the dataset outperform general-purpose LVLMs.
Interleaved visual reasoning enhances report quality and consistency.
Abstract
Radiological diagnosis is a perceptual process in which careful visual inspection and language reasoning are repeatedly interleaved. Most medical large vision language models (LVLMs) perform visual inspection only once and then rely on text-only chain-of-thought (CoT) reasoning, which operates purely in the linguistic space and is prone to hallucination. Recent methods attempt to mitigate this issue by introducing visually related coordinates, such as bounding boxes. However, these remain a pseudo-visual solution: coordinates are still text and fail to preserve rich visual details like texture and density. Motivated by the interleaved nature of radiological diagnosis, we introduce MMRad-IVL-22K, the first large-scale dataset designed for natively interleaved visual language reasoning in chest X-ray interpretation. MMRad-IVL-22K reflects a repeated cycle of reasoning and visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Topic Modeling
