The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

Jiale Zhao; Xing Mou; Jinlin Wu; Hongyuan Yu; Mingrui Sun; Yang Shi; Xuanwu Yin; Zhen Chen; Zhen Lei; Yaohua Wang

arXiv:2601.04199·cs.LG·January 9, 2026

The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

Jiale Zhao, Xing Mou, Jinlin Wu, Hongyuan Yu, Mingrui Sun, Yang Shi, Xuanwu Yin, Zhen Chen, Zhen Lei, Yaohua Wang

PDF

Open Access

TL;DR

This paper evaluates safety issues in Medical Multimodal Large Language Models, revealing vulnerabilities and proposing a parameter-space intervention method to improve safety without sacrificing medical performance.

Contribution

It introduces a systematic safety benchmarking framework and a novel parameter-space safety re-alignment technique for Medical MLLMs.

Findings

01

Models are vulnerable to cross-modality jailbreak attacks.

02

Medical fine-tuning causes safety-related catastrophic forgetting.

03

Proposed method enhances safety with minimal impact on medical capabilities.

Abstract

Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs. Our empirical analysis reveals pervasive vulnerabilities across both general and medical-specific safety dimensions in existing models, particularly highlighting their fragility against cross-modality jailbreak attacks. Furthermore, we find that the medical fine-tuning process frequently induces catastrophic forgetting of the model's original safety alignment. To address this challenge, we propose a novel "Parameter-Space Intervention" approach for efficient safety re-alignment. This method extracts intrinsic safety knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education