MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization

Haitao Zhang; Yingying Wang; Jiaxiang Wang; Haote Xu; Hongyang Zhang; Yirong Chen; Yue Huang; Xinghao Ding

arXiv:2602.01081·cs.CV·February 3, 2026

MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization

Haitao Zhang, Yingying Wang, Jiaxiang Wang, Haote Xu, Hongyang Zhang, Yirong Chen, Yue Huang, Xinghao Ding

PDF

Open Access

TL;DR

MedAD-R1 is a new multimodal medical anomaly detection model that uses a two-stage training process with consistency reinforcement to produce transparent, coherent reasoning, significantly improving diagnostic accuracy and interpretability.

Contribution

We introduce MedAD-38K, a large-scale multimodal benchmark, and propose a novel two-stage training framework with consistency reinforcement to enhance reasoning in medical AI models.

Findings

01

Achieves over 10% improvement on MedAD-38K benchmark.

02

Generates logically consistent and transparent diagnostic reasoning.

03

Outperforms existing models in medical anomaly detection accuracy.

Abstract

Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical images. However, the reliance on Supervised Fine-Tuning (SFT) on simplistic and fragmented datasets has hindered the development of models capable of plausible reasoning and robust multimodal generalization. To overcome this, we introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs. On this foundation, we propose a two-stage training framework. The first stage, Cognitive Injection, uses SFT to instill foundational medical knowledge and align the model with a structured think-then-answer paradigm. Given that standard policy optimization can produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)