MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence
Woohyeon Park, Jaeik Kim, Sunghwan Steve Cho, Pa Hong, Wookyoung Jeong, Yoojin Nam, Namjoon Kim, Ginny Y. Wong, Ka Chun Cheung, Jaeyoung Do

TL;DR
MEDIC-AD is a novel medical vision-language model designed to improve clinical decision-making by enhancing lesion detection, symptom tracking, and explainability through a stage-wise framework.
Contribution
The paper introduces MEDIC-AD, a stage-wise framework that incorporates anomaly-aware tokens, temporal difference encoding, and explainability modules for improved clinical image analysis.
Findings
Achieves state-of-the-art results in anomaly detection, symptom tracking, and lesion segmentation.
Provides stable and clinically faithful explanations in real hospital workflows.
Enhances model focus on abnormal regions and temporal changes in longitudinal studies.
Abstract
Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
