GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking
Yunqiang Wang, Hengyuan Na, Di Wu, Miao Hu, Guocong Quan

TL;DR
This paper introduces GRM, a frequency-selective jailbreak method for audio LLMs that balances attack success with utility preservation by perturbing only the most impactful frequency bands.
Contribution
It proposes a novel utility-aware frequency-selective attack framework that improves the trade-off between jailbreak success and utility retention in audio LLMs.
Findings
GRM achieves an average jailbreak success rate of 88.46%.
Frequency-selective perturbation outperforms full-band coverage in utility preservation.
Selective band perturbation yields a better attack-utility trade-off.
Abstract
Audio large language models (ALLMs) enable rich speech-text interaction, but they also introduce jailbreak vulnerabilities in the audio modality. Existing audio jailbreak methods mainly optimize jailbreak success while overlooking utility preservation, as reflected in transcription quality and question answering performance. In practice, stronger attacks often come at the cost of degraded utility. To study this trade-off, we revisit existing attacks by varying their perturbation coverage in the frequency domain, from partial-band to full-band, and find that broader frequency coverage does not necessarily improve jailbreak performance, while utility consistently deteriorates. This suggests that concentrating perturbation on a subset of bands can yield a better attack-utility trade-off than indiscriminate full-band coverage. Based on this insight, we propose GRM, a utility-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
