Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models

Xiang He; Chenxing Li; Jinting Wang; Yan Rong; Tianxin Xie; Wenfu Wang; Li Liu; Dong Yu

arXiv:2604.18187·cs.SD·April 21, 2026

Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models

Xiang He, Chenxing Li, Jinting Wang, Yan Rong, Tianxin Xie, Wenfu Wang, Li Liu, Dong Yu

PDF

TL;DR

Audio-DeepThinker introduces a reinforcement learning framework with a hybrid reward and curriculum to enable high-quality reasoning in audio-language models without supervised reasoning fine-tuning.

Contribution

It presents a novel RL-based approach with a hybrid reward and progressive curriculum to foster emergent reasoning capabilities in audio-language models.

Findings

01

Achieved state-of-the-art results on multiple audio reasoning benchmarks.

02

Won 1st Place in the Interspeech 2026 Audio Reasoning Challenge.

03

Revealed that RL training reshapes upper-layer gating mechanisms and reasoning tokens crystallize progressively.

Abstract

Large Audio-Language Models (LALMs) have made significant progress in audio understanding, yet they primarily operate as perception-and-answer systems without explicit reasoning processes. Existing methods for enhancing audio reasoning rely either on supervised chain-of-thought (CoT) fine-tuning, which is limited by training data quality, or on reinforcement learning (RL) with coarse rewards that do not directly evaluate reasoning quality. As a result, the generated reasoning chains often appear well-structured yet lack specific acoustic grounding. We propose Audio-DeepThinker, a framework built on two core ideas. First, we introduce a hybrid reasoning similarity reward that directly supervises the quality of generated reasoning chains by combining an LLM evaluator assessing logical path alignment, key step coverage, and analytical depth with an embedding similarity component enforcing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.