X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning
Chee Ng, Liliang Sun, Shaoqing Tang

TL;DR
X-Ray-CoT introduces an interpretable, chain-of-thought reasoning framework using vision-language models for chest X-ray diagnosis, producing accurate and explainable diagnostic reports to enhance clinical trust.
Contribution
The paper presents a novel vision-language model that combines multi-modal feature extraction with chain-of-thought prompting for interpretable chest X-ray diagnosis.
Findings
Achieved 80.52% balanced accuracy on CORDA dataset.
Generated high-quality, explainable diagnostic reports.
Outperformed existing black-box models in accuracy.
Abstract
Chest X-ray imaging is crucial for diagnosing pulmonary and cardiac diseases, yet its interpretation demands extensive clinical experience and suffers from inter-observer variability. While deep learning models offer high diagnostic accuracy, their black-box nature hinders clinical adoption in high-stakes medical settings. To address this, we propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework leveraging Vision-Language Large Models (LVLMs) for intelligent chest X-ray diagnosis and interpretable report generation. X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts, then employing an LLM-based component with a structured Chain-of-Thought prompting strategy to reason and produce detailed natural language diagnostic reports. Evaluated on the CORDA dataset, X-Ray-CoT achieves competitive quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare
