TL;DR
JALMBench is a comprehensive benchmark designed to evaluate and compare jailbreak vulnerabilities in Large Audio Language Models, analyzing attack methods, defenses, and the influence of model architecture on safety.
Contribution
This work introduces JALMBench, the first large-scale adversarial audio dataset and framework for systematically assessing jailbreak vulnerabilities in LALMs.
Findings
Safety is affected by modality and architecture.
Text-based safety transfer partially applies to audio.
Existing moderation methods offer limited security improvements.
Abstract
Large Audio Language Models (LALMs) have made significant progress. While increasingly deployed in real-world applications, LALMs face growing safety risks from jailbreak attacks that bypass safety alignment. However, there remains a lack of an adversarial audio dataset and a unified framework specifically designed to evaluate and compare jailbreak attacks against them. To address this gap, we introduce JALMBench, a comprehensive benchmark that assesses LALM safety against jailbreak attacks, comprising 11,316 text samples and 245,355 audio samples (>1,000 hours). JALMBench supports 12 mainstream LALMs, 8 attack methods (4 text-transferred and 4 audio-originated), and 5 defenses. We conduct in-depth analysis on attack efficiency, topic sensitivity, voice diversity, and model architecture. Additionally, we explore mitigation strategies for the attacks at both the prompt and response…
Peer Reviews
Decision·ICLR 2026 Poster
I think overall the paper gives us valuable insight into the differences between discrete tokenization and continuous encoding approaches, though the defense evaluation could be strengthened with multi-objective analysis. I'm convinced that the paper's contribution is overall original, and in particular it seems important to be able to compare across different architectures for audio models. The benchmark itself is pretty extensive on the fronts of both the models and the attack methods evaluat
As above, I think the paper is overall quite useful. However, in the analysis & framing of the defense results, I'd be interested in the authors making some aspects a little clearer. In particular, while table 4 is helpful for understanding the raw ASR reduction from defense methods, I have to jump all the way to the appendix to get the full breakdown of ASR reduction vs capability retention. And, even then, the authors don't emphasise this tradeoff much in their analysis of the defenses, which
1. **Comprehensiveness.** The scale and coverage of JALMBench may position the work as canonical testbed akin to JailbreakBench or AdvBench for text-based. It covers 12 ALMs, 8 distinct attacks, 5 defenses, and multidimensional analyses (efficiency, topic, voice, architecture). 2. **Reproducibility.** The paper provides an anonymous GitHub repository that unifies interfaces for ALMs and defenses, documenting generation pipelines that reflect high implementation quality. 3. **Empirical Findings
1. **Limited Novelty Beyond Benchmark Construction**. The paper primarily aggregates existing attack/defense methods without introducing fundamentally new algorithms or method. The main novelty only lies on integration and scale rather than the concept. Also, cite existing ALM jailbreaking attacks that shows overlap in both evaluation and method to provide the difference of this work compared to existing ones [1, 2] 2. **Overreliance on LLM-as-a-Judge**. The evaluation relies solely on GPT-4o j
- Very solid, very thorough paper - I thought Figure 1 is cute - The models, attacks, and analyses seem sensible
Please begin with my line-by-line notes under "Questions". Overall, I think this is a solid and thorough paper. I see two primary weaknesses that, if modified, would lead me to increase my score. 1. *All* of this uses a single judge model (gpt-4o-2024-11-20). I cannot find mention of another judge model (ideally, multiple other judge models) or any analysis of how accurate / reliable / trustworthy this particular judge model is. Thus, rather than understanding these results as describing the r
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
