X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning

Chee Ng; Liliang Sun; Shaoqing Tang

arXiv:2508.12455·cs.CV·August 19, 2025

X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning

Chee Ng, Liliang Sun, Shaoqing Tang

PDF

Open Access

TL;DR

X-Ray-CoT introduces an interpretable, chain-of-thought reasoning framework using vision-language models for chest X-ray diagnosis, producing accurate and explainable diagnostic reports to enhance clinical trust.

Contribution

The paper presents a novel vision-language model that combines multi-modal feature extraction with chain-of-thought prompting for interpretable chest X-ray diagnosis.

Findings

01

Achieved 80.52% balanced accuracy on CORDA dataset.

02

Generated high-quality, explainable diagnostic reports.

03

Outperformed existing black-box models in accuracy.

Abstract

Chest X-ray imaging is crucial for diagnosing pulmonary and cardiac diseases, yet its interpretation demands extensive clinical experience and suffers from inter-observer variability. While deep learning models offer high diagnostic accuracy, their black-box nature hinders clinical adoption in high-stakes medical settings. To address this, we propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework leveraging Vision-Language Large Models (LVLMs) for intelligent chest X-ray diagnosis and interpretable report generation. X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts, then employing an LLM-based component with a structured Chain-of-Thought prompting strategy to reason and produce detailed natural language diagnostic reports. Evaluated on the CORDA dataset, X-Ray-CoT achieves competitive quantitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare