ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models
Xiwei Liu, Yulong Li, Xinlin Zhuang, Xuhui Li, Jianxu Chen, Haolin Yang, Imran Razzak, and Yutong Xie

TL;DR
ClinCoT introduces a novel visual chain-of-thought framework for medical vision-language models, enhancing factual grounding and reasoning by leveraging clinical visual cues and dynamic preference data generation.
Contribution
It proposes a clinical-aware visual reasoning approach with an automatic data pipeline and iterative learning, improving factual grounding in medical VQA and report generation tasks.
Findings
Significantly improves factual grounding in medical VQA.
Outperforms existing preference-based alignment methods.
Enhances reasoning accuracy in medical report generation.
Abstract
Medical Vision-Language Models have shown promising potential in clinical decision support, yet they remain prone to factual hallucinations due to insufficient grounding in localized pathological evidence. Existing medical alignment methods primarily operate at the response level through preference optimization, improving output correctness but leaving intermediate reasoning weakly connected to visual regions. Although chain-of-thought (CoT) enhances multimodal reasoning, it remains largely text-centric, limiting effective integration of clinical visual cues. To address this gap, we propose ClinCoT, a clinical-aware visual chain-of-thought framework that transforms preference optimization from response-level correction to visual-driven reasoning. We introduce an automatic data generation pipeline that constructs clinically grounded preference pairs through reasoning with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Topic Modeling
