The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs
Jiale Zhao, Xing Mou, Jinlin Wu, Hongyuan Yu, Mingrui Sun, Yang Shi, Xuanwu Yin, Zhen Chen, Zhen Lei, Yaohua Wang

TL;DR
This paper evaluates safety issues in Medical Multimodal Large Language Models, revealing vulnerabilities and proposing a parameter-space intervention method to improve safety without sacrificing medical performance.
Contribution
It introduces a systematic safety benchmarking framework and a novel parameter-space safety re-alignment technique for Medical MLLMs.
Findings
Models are vulnerable to cross-modality jailbreak attacks.
Medical fine-tuning causes safety-related catastrophic forgetting.
Proposed method enhances safety with minimal impact on medical capabilities.
Abstract
Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs. Our empirical analysis reveals pervasive vulnerabilities across both general and medical-specific safety dimensions in existing models, particularly highlighting their fragility against cross-modality jailbreak attacks. Furthermore, we find that the medical fine-tuning process frequently induces catastrophic forgetting of the model's original safety alignment. To address this challenge, we propose a novel "Parameter-Space Intervention" approach for efficient safety re-alignment. This method extracts intrinsic safety knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education
