SafeMed-R1: Adversarial Reinforcement Learning for Generalizable and Robust Medical Reasoning in Vision-Language Models

A.A. Gde Yogi Pramana; Jason Ray; Anthony Jaya; and Michael Wijaya

arXiv:2512.19317·cs.AI·December 23, 2025

SafeMed-R1: Adversarial Reinforcement Learning for Generalizable and Robust Medical Reasoning in Vision-Language Models

A.A. Gde Yogi Pramana, Jason Ray, Anthony Jaya, and Michael Wijaya

PDF

Open Access

TL;DR

SafeMed-R1 is a hybrid adversarial training framework that enhances the robustness of medical vision-language models against attacks while maintaining high-quality, interpretable reasoning, validated on a large medical VQA benchmark.

Contribution

The paper introduces SafeMed-R1, combining adversarial training with randomized smoothing to improve robustness and interpretability in medical VQA models.

Findings

01

SafeMed-R1 significantly improves adversarial robustness from 25% to 84.45% accuracy.

02

Models with chain-of-thought reasoning are more robust than instruction-only models.

03

SafeMed-R1 maintains high performance on a large, multi-modal medical VQA dataset.

Abstract

Vision--Language Models (VLMs) show significant promise for Medical Visual Question Answering (VQA), yet their deployment in clinical settings is hindered by severe vulnerability to adversarial attacks. Standard adversarial training, while effective for simpler tasks, often degrades both generalization performance and the quality of generated clinical reasoning. We introduce SafeMed-R1, a hybrid defense framework that ensures robust performance while preserving high-quality, interpretable medical reasoning. SafeMed-R1 employs a two-stage approach: at training time, we integrate Adversarial Training with Group Relative Policy Optimization (AT-GRPO) to explicitly robustify the reasoning process against worst-case perturbations; at inference time, we augment the model with Randomized Smoothing to provide certified $L_{2}$ -norm robustness guarantees. We evaluate SafeMed-R1 on the OmniMedVQA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning