AudioScene: Integrating Object-Event Audio into 3D Scenes
Shuaihang Yuan, Congcong Wen, Muhammad Shafique, Anthony Tzes, Yi Fang

TL;DR
This paper introduces two novel 3D scene datasets with integrated spatial audio, enabling research on audio-visual interactions in spatial contexts for improved human-computer interaction and environmental understanding.
Contribution
The creation of AudioScanNet and AudioRoboTHOR datasets that combine spatial audio with 3D scenes, using large language models and human verification for scalable, accurate annotations.
Findings
Datasets enable new audioconditioned 3D scene tasks
Benchmark results reveal limitations of current audiocentric methods
High annotation quality confirmed by inter-annotator agreement
Abstract
The rapid advances in audio analysis underscore its vast potential for humancomputer interaction, environmental monitoring, and public safety; yet, existing audioonly datasets often lack spatial context. To address this gap, we present two novel audiospatial scene datasets, AudioScanNet and AudioRoboTHOR, designed to explore audioconditioned tasks within 3D environments. By integrating audio clips with spatially aligned 3D scenes, our datasets enable research on how audio signals interact with spatial context. To associate audio events with corresponding spatial information, we leverage the common sense reasoning ability of large language models and supplement them with rigorous human verification, This approach offers greater scalability compared to purely manual annotation while maintaining high standards of accuracy, completeness, and diversity, quantified through inter annotator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Multimodal Machine Learning Applications
