AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
Zhe Sun, Yujun Cai, Jiayu Yao, Yiwei Wang

TL;DR
This paper introduces AudioMotionBench, a benchmark to evaluate auditory motion perception in Audio-Language Models, revealing significant deficits in models' ability to recognize sound source movement and highlighting a key gap in spatial reasoning capabilities.
Contribution
The paper presents the first benchmark specifically designed to assess auditory motion understanding in Audio-Language Models, exposing their limitations in perceiving sound source movement.
Findings
Current models have less than 50% accuracy in motion perception tasks.
Models struggle to recognize motion cues and directional patterns.
There is a fundamental gap in auditory spatial reasoning in existing models.
Abstract
Large Audio-Language Models (LALMs) have recently shown impressive progress in speech recognition, audio captioning, and auditory question answering. Yet, whether these models can perceive spatial dynamics, particularly the motion of sound sources, remains unclear. In this work, we uncover a systematic motion perception deficit in current ALLMs. To investigate this issue, we introduce AudioMotionBench, the first benchmark explicitly designed to evaluate auditory motion understanding. AudioMotionBench introduces a controlled question-answering benchmark designed to evaluate whether Audio-Language Models (LALMs) can infer the direction and trajectory of moving sound sources from binaural audio. Comprehensive quantitative and qualitative analyses reveal that current models struggle to reliably recognize motion cues or distinguish directional patterns. The average accuracy remains below…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Multisensory perception and integration
