Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models

Tiansheng Huang; Virat Shejwalkar; Oscar Chang; Milad Nasr; Ling Liu

arXiv:2511.09682·cs.AI·November 14, 2025

Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models

Tiansheng Huang, Virat Shejwalkar, Oscar Chang, Milad Nasr, Ling Liu

PDF

Open Access

TL;DR

Rebellion introduces a noise-robust reasoning training method for audio models, enhancing their safety against sophisticated jailbreak attacks while maintaining performance on normal tasks.

Contribution

The paper proposes Rebellion, a novel training approach that improves audio reasoning models' robustness to representation drift and jailbreak attacks, addressing safety concerns.

Findings

01

Rebellion effectively defends against advanced audio jailbreaks.

02

It maintains high performance on benign tasks.

03

It improves the accuracy-safety trade-off compared to standard training.

Abstract

Instilling reasoning capabilities in large models (LMs) using reasoning training (RT) significantly improves LMs' performances. Thus Audio Reasoning Models (ARMs), i.e., audio LMs that can reason, are becoming increasingly popular. However, no work has studied the safety of ARMs against jailbreak attacks that aim to elicit harmful responses from target models. To this end, first, we show that standard RT with appropriate safety reasoning data can protect ARMs from vanilla audio jailbreaks, but cannot protect them against our proposed simple yet effective jailbreaks. We show that this is because of the significant representation drift between vanilla and advanced jailbreaks which forces the target ARMs to emit harmful responses. Based on this observation, we propose Rebellion, a robust RT that trains ARMs to be robust to the worst-case representation drift. All our results are on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification