Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

Roee Ziv; Raz Lapid; Moshe Sipper

arXiv:2512.23881·cs.SD·January 1, 2026

Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

Roee Ziv, Raz Lapid, Moshe Sipper

PDF

Open Access

TL;DR

This paper introduces a universal encoder-level adversarial attack on audio-language models, demonstrating high success rates and minimal distortion, exposing a new security vulnerability in multimodal systems.

Contribution

It presents the first universal targeted latent space attack on audio encoders that generalizes across inputs and speakers without needing access to the language model.

Findings

01

High attack success rates across diverse inputs

02

Minimal perceptual distortion in adversarial examples

03

Reveals a critical security vulnerability in encoder-level of multimodal models

Abstract

Audio-language models combine audio encoders with large language models to enable multimodal reasoning, but they also introduce new security vulnerabilities. We propose a universal targeted latent space attack, an encoder-level adversarial attack that manipulates audio latent representations to induce attacker-specified outputs in downstream language generation. Unlike prior waveform-level or input-specific attacks, our approach learns a universal perturbation that generalizes across inputs and speakers and does not require access to the language model. Experiments on Qwen2-Audio-7B-Instruct demonstrate consistently high attack success rates with minimal perceptual distortion, revealing a critical and previously underexplored attack surface at the encoder level of multimodal systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Hate Speech and Cyberbullying Detection