DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

Yitong Zhang; Jia Li; Liyi Cai; Ge Li

arXiv:2506.09353·cs.CR·November 18, 2025

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

Yitong Zhang, Jia Li, Liyi Cai, Ge Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DAVSP, a novel safety alignment method for large vision-language models that uses a visual safety prompt and deep alignment to resist malicious queries while maintaining utility on benign inputs.

Contribution

The paper proposes a new safety alignment technique combining a visual safety prompt with deep supervision, improving resistance to malicious queries in LVLMs.

Findings

01

Effective resistance to malicious queries across five benchmarks

02

Preserves utility on benign inputs

03

Exhibits strong cross-model generation ability

Abstract

Large Vision-Language Models (LVLMs) have achieved impressive progress across various applications but remain vulnerable to malicious queries that exploit the visual modality. Existing alignment approaches typically fail to resist malicious queries while preserving utility on benign ones effectively. To address these challenges, we propose Deep Aligned Visual Safety Prompt (DAVSP), which is built upon two key innovations. First, we introduce the Visual Safety Prompt, which appends a trainable padding region around the input image. It preserves visual features and expands the optimization space. Second, we propose Deep Alignment, a novel approach to train the visual safety prompt through supervision in the model's activation space. It enhances the inherent ability of LVLMs to perceive malicious queries, achieving deeper alignment than prior works. Extensive experiments across five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangyitonggg/davsp
pytorchOfficial

Videos

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning