AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning

Liyang Chen; Hongkai Chen; Yujun Cai; Sifan Li; Qingwen Ye; Yiwei Wang

arXiv:2602.10439·cs.SD·February 12, 2026

AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning

Liyang Chen, Hongkai Chen, Yujun Cai, Sifan Li, Qingwen Ye, Yiwei Wang

PDF

Open Access

TL;DR

AudioRouter is a reinforcement learning framework that enhances large audio language models' understanding by learning to selectively use external tools, significantly reducing data requirements and improving performance on audio perception tasks.

Contribution

It introduces a novel RL-based routing policy for external tool usage in LALMs, enabling data-efficient audio understanding without retraining the core model.

Findings

01

Achieves substantial improvements on audio benchmarks.

02

Requires up to 600x less training data.

03

Demonstrates scalable, data-efficient learning of tool usage.

Abstract

Large Audio Language Models (LALMs) have demonstrated strong capabilities in audio understanding and reasoning. However, their performance on fine grained auditory perception remains unreliable, and existing approaches largely rely on data intensive training to internalize perceptual abilities. We propose AudioRouter, a reinforcement learning framework that enables LALMs to improve audio understanding by learning when and how to use external audio tools. Rather than tightly coupling tool usage with audio reasoning, AudioRouter formulates tool use as an explicit decision making problem and optimizes a lightweight routing policy while keeping the underlying reasoning model frozen. Experimental results show that AudioRouter achieves substantial improvements on standard audio understanding benchmarks while requiring up to 600x less training data to learn tool usage compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Hearing Loss and Rehabilitation