MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

Woohyeon Park; Jaeik Kim; Sunghwan Steve Cho; Pa Hong; Wookyoung Jeong; Yoojin Nam; Namjoon Kim; Ginny Y. Wong; Ka Chun Cheung; Jaeyoung Do

arXiv:2603.27176·cs.CV·March 31, 2026

MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

Woohyeon Park, Jaeik Kim, Sunghwan Steve Cho, Pa Hong, Wookyoung Jeong, Yoojin Nam, Namjoon Kim, Ginny Y. Wong, Ka Chun Cheung, Jaeyoung Do

PDF

TL;DR

MEDIC-AD is a novel medical vision-language model designed to improve clinical decision-making by enhancing lesion detection, symptom tracking, and explainability through a stage-wise framework.

Contribution

The paper introduces MEDIC-AD, a stage-wise framework that incorporates anomaly-aware tokens, temporal difference encoding, and explainability modules for improved clinical image analysis.

Findings

01

Achieves state-of-the-art results in anomaly detection, symptom tracking, and lesion segmentation.

02

Provides stable and clinically faithful explanations in real hospital workflows.

03

Enhances model focus on abnormal regions and temporal changes in longitudinal studies.

Abstract

Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical Vision-Language Models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs. To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these three capabilities through a stage-wise framework. First, learnable anomaly-aware tokens (<Ano>) encourage the model to focus on abnormal regions and build more discriminative lesion centered representations. Second, inter image difference tokens (<Diff>) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden. Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.