In-the-wild Audio Spatialization with Flexible Text-guided Localization

Tianrui Pan; Jie Liu; Zewen Huang; Jie Tang; Gangshan Wu

arXiv:2506.00927·cs.SD·June 3, 2025

In-the-wild Audio Spatialization with Flexible Text-guided Localization

Tianrui Pan, Jie Liu, Zewen Huang, Jie Tang, Gangshan Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a flexible text-guided framework for audio spatialization in immersive environments, utilizing a large-scale dataset and a novel assessment model to improve spatial accuracy and semantic coherence.

Contribution

It presents the TAS framework with a new dataset and an evaluation method, enabling interactive and accurate binaural audio generation guided by text prompts.

Findings

01

Outperforms existing methods on simulated and real datasets

02

Demonstrates superior generalization and spatial accuracy

03

Achieves high semantic coherence with text prompts

Abstract

To enhance immersive experiences, binaural audio offers spatial awareness of sounding objects in AR, VR, and embodied AI applications. While existing audio spatialization methods can generally map any available monaural audio to binaural audio signals, they often lack the flexible and interactive control needed in complex multi-object user-interactive environments. To address this, we propose a Text-guided Audio Spatialization (TAS) framework that utilizes flexible text prompts and evaluates our model from unified generation and comprehension perspectives. Due to the limited availability of premium and large-scale stereo data, we construct the SpatialTAS dataset, which encompasses 376,000 simulated binaural audio samples to facilitate the training of our model. Our model learns binaural differences guided by 3D spatial location and relative position prompts, augmented by flipped-channel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alice01010101/tasu
pytorchOfficial

Videos

In-the-wild Audio Spatialization with Flexible Text-guided Localization· underline

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis