The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models

Peiyuan Tang; Haojie Xin; Xiaodong Zhang; Jun Sun; Qin Xia; Zijiang Yang

arXiv:2506.15734·cs.AI·June 23, 2025

The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models

Peiyuan Tang, Haojie Xin, Xiaodong Zhang, Jun Sun, Qin Xia, Zijiang Yang

PDF

Open Access

TL;DR

This paper introduces 'The Safety Reminder', a soft prompt tuning method that reactivates delayed safety awareness in vision-language models, effectively reducing harmful content generation without affecting normal performance.

Contribution

It reveals the phenomenon of delayed safety awareness in VLMs and proposes a prompt-based technique to proactively enhance safety during model deployment.

Findings

01

Significantly reduces attack success rates on safety benchmarks.

02

Maintains model utility and normal conversation quality.

03

Effectively activates safety awareness only when needed.

Abstract

As Vision-Language Models (VLMs) demonstrate increasing capabilities across real-world applications such as code generation and chatbot assistance, ensuring their safety has become paramount. Unlike traditional Large Language Models (LLMs), VLMs face unique vulnerabilities due to their multimodal nature, allowing adversaries to modify visual or textual inputs to bypass safety guardrails and trigger the generation of harmful content. Through systematic analysis of VLM behavior under attack, we identify a novel phenomenon termed ``delayed safety awareness''. Specifically, we observe that safety-aligned VLMs may initially be compromised to produce harmful content, but eventually recognize the associated risks and attempt to self-correct. This pattern suggests that VLMs retain their underlying safety awareness but experience a temporal delay in their activation. Building on this insight, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Ethics and Social Impacts of AI