SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models

Xingjian Diao; Chunhui Zhang; Keyi Kong; Weiyi Wu; Chiyu Ma; Zhongyu Ouyang; Peijun Qing; Soroush Vosoughi; Jiang Gui

arXiv:2506.12935·cs.CL·September 23, 2025

SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models

Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui

PDF

Open Access 1 Repo 1 Models 1 Datasets 1 Video

TL;DR

This paper introduces SoundMind, a new dataset and reinforcement learning method to enhance reasoning abilities in audio-language models, demonstrating significant improvements over existing baselines.

Contribution

It presents a novel dataset and RL algorithm specifically designed for audio logical reasoning, advancing the capabilities of audio-language models.

Findings

01

Improved reasoning performance on the SoundMind benchmark

02

Effective fine-tuning of Qwen2.5-Omni-7B with the new dataset

03

Demonstrated benefits of combining high-quality data with RL techniques

Abstract

While large language models have demonstrated impressive reasoning abilities, their extension to the audio modality, particularly within large audio-language models (LALMs), remains underexplored. Addressing this gap requires a systematic approach that involves a capable base model, high-quality reasoning-oriented audio data, and effective training algorithms. In this work, we present a comprehensive solution for audio logical reasoning (ALR) tasks: we introduce SoundMind, a dataset of 6,446 audio-text annotated samples specifically curated to support complex reasoning. Building on this resource, we propose SoundMind-RL, a rule-based reinforcement learning (RL) algorithm designed to equip audio-language models with robust audio-text reasoning capabilities. By fine-tuning Qwen2.5-Omni-7B on the proposed SoundMind dataset using SoundMind-RL, we achieve strong and consistent improvements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xid32/soundmind
pytorchOfficial

Models

🤗
SoundMind-RL/SoundMindModel
model· 7 dl
7 dl

Datasets

SoundMind-RL/SoundMindDataset
dataset· 404 dl
404 dl

Videos

SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models· underline

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies

MethodsBalanced Selection