Spatial Audio Question Answering and Reasoning on Dynamic Source Movements
Arvind Krishna Sridhar, Yinyi Guo, Erik Visser

TL;DR
This paper advances spatial audio question answering by introducing a movement-focused augmentation, a reasoning-enabled multimodal approach, and analyzing the effects of source separation, significantly improving understanding of dynamic sound sources.
Contribution
It presents a novel movement-centric augmentation framework, a reasoning-enabled multimodal finetuning method, and an analysis of source separation impacts for spatial audio understanding.
Findings
Reasoning improves source separation benefits.
Thinking mode enhances accuracy by +5.1%.
Movement modeling and separation quality are interconnected.
Abstract
Spatial audio understanding aims to enable machines to interpret complex auditory scenes, particularly when sound sources move over time. In this work, we study Spatial Audio Question Answering (Spatial AQA) with a focus on movement reasoning, where a model must infer object motion, position, and directional changes directly from stereo audio. First, we introduce a movement-centric spatial audio augmentation framework that synthesizes diverse motion patterns from isolated mono audio events, enabling controlled and scalable training data generation. Second, we propose an end-to-end multimodal finetuning approach with a thinking mode, which allows audio-language models to produce explicit intermediate reasoning steps before predicting an answer. Third, we investigate the impact of query-conditioned source separation as a preprocessing stage and compare three inference regimes: no masking,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Multimodal Machine Learning Applications
