SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
Weikai Lu, Hao Peng, Huiping Zhuang, Cen Chen, Ziqian Zeng

TL;DR
SEA introduces a method to enhance multimodal large language model security by synthesizing embeddings for additional modalities, enabling effective safety alignment with minimal resource requirements.
Contribution
The paper proposes Synthetic Embedding augmented safety Alignment (SEA), a novel approach that optimizes embeddings to facilitate multimodal safety alignment using only textual data.
Findings
SEA synthesizes high-quality embeddings within seconds on a single GPU.
SEA significantly improves MLLM security against multimodal threats.
The VA-SafetyBench benchmark reveals high attack success rates, validating security challenges.
Abstract
Multimodal Large Language Models (MLLMs) have serious security vulnerabilities.While safety alignment using multimodal datasets consisting of text and data of additional modalities can effectively enhance MLLM's security, it is costly to construct these datasets. Existing low-resource security alignment methods, including textual alignment, have been found to struggle with the security risks posed by additional modalities. To address this, we propose Synthetic Embedding augmented safety Alignment (SEA), which optimizes embeddings of additional modality through gradient updates to expand textual datasets. This enables multimodal safety alignment training even when only textual data is available. Extensive experiments on image, video, and audio-based MLLMs demonstrate that SEA can synthesize a high-quality embedding on a single RTX3090 GPU within 24 seconds. SEA significantly improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
