Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Kun Wang; Zherui Li; Zhenhong Zhou; Yitong Zhang; Yan Mi; Kun Yang; Yiming Zhang; Junhao Dong; Zhongxiang Sun; Qiankun Li; Yang Liu

arXiv:2602.10161·cs.CR·February 12, 2026

Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Kun Wang, Zherui Li, Zhenhong Zhou, Yitong Zhang, Yan Mi, Kun Yang, Yiming Zhang, Junhao Dong, Zhongxiang Sun, Qiankun Li, Yang Liu

PDF

Open Access 1 Datasets

TL;DR

This paper investigates safety vulnerabilities in omni-modal large language models, revealing a specific refusal mechanism and proposing OmniSteer to improve safety while maintaining multimodal capabilities.

Contribution

It introduces a systematic analysis of cross-modal safety risks, a novel vulnerability detection dataset, and a new intervention method called OmniSteer for safer OLLMs.

Findings

01

Refusal success rate increased from 69.9% to 91.2%.

02

Identified a Mid-layer Dissolution phenomenon in OLLMs.

03

Proposed a lightweight adaptive intervention method, OmniSteer.

Abstract

Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a systematic understanding of vulnerabilities in omni-modal interactions remains lacking. To bridge this gap, we establish a modality-semantics decoupling principle and construct the AdvBench-Omni dataset, which reveals a significant vulnerability in OLLMs. Mechanistic analysis uncovers a Mid-layer Dissolution phenomenon driven by refusal vector magnitude shrinkage, alongside the existence of a modal-invariant pure refusal direction. Inspired by these insights, we extract a golden refusal vector using Singular Value Decomposition and propose OmniSteer, which utilizes lightweight adapters to modulate intervention intensity adaptively. Extensive experiments show that our method not only increases the Refusal Success Rate against harmful inputs from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ailor/AdvBench-omni
dataset· 8.1k dl
8.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications