Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

Mengxuan Wang; Yuxin Chen; Gang Xu; Tao He; Hongjie Jiang; Ming Li

arXiv:2602.03402·cs.AI·April 14, 2026

Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

Mengxuan Wang, Yuxin Chen, Gang Xu, Tao He, Hongjie Jiang, Ming Li

PDF

TL;DR

This paper introduces Risk Awareness Injection (RAI), a lightweight, training-free method to enhance vision-language models' safety by amplifying unsafe signals, effectively reducing jailbreak success while maintaining utility.

Contribution

RAI is a novel, training-free framework that restores risk recognition in VLMs by targeted modulation of visual tokens, improving safety without utility loss.

Findings

01

RAI significantly lowers jailbreak attack success rates.

02

RAI maintains the original task performance of VLMs.

03

RAI is lightweight and does not require additional training.

Abstract

Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings, yet remain highly vulnerable to multimodal jailbreak attacks. Existing defenses predominantly rely on safety fine-tuning or aggressive token manipulations, incurring substantial training costs or significantly degrading utility. Recent research shows that LLMs inherently recognize unsafe content in text, and the incorporation of visual inputs in VLMs frequently dilutes risk-related signals. Motivated by this, we propose Risk Awareness Injection (RAI), a lightweight and training-free framework for safety calibration that restores LLM-like risk recognition by amplifying unsafe signals in VLMs. Specifically, RAI constructs an Unsafe Prototype Subspace from language embeddings and performs targeted modulation on selected high-risk visual tokens, explicitly activating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.