Generating Moving 3D Soundscapes with Latent Diffusion Models

Christian Templin; Yanda Zhu; Hao Wang

arXiv:2507.07318·cs.SD·September 22, 2025

Generating Moving 3D Soundscapes with Latent Diffusion Models

Christian Templin, Yanda Zhu, Hao Wang

PDF

Open Access

TL;DR

SonicMotion is a novel latent diffusion framework that generates 3D FOA spatial audio with moving sound sources, offering explicit control and high localization accuracy for immersive experiences.

Contribution

It introduces the first end-to-end model for dynamic FOA audio generation with natural language and spatial control, supported by a large new dataset.

Findings

01

Achieves state-of-the-art semantic alignment and perceptual quality.

02

Attains low spatial localization error.

03

Supports both descriptive and parametric control modes.

Abstract

Spatial audio has become central to immersive applications such as VR/AR, cinema, and music. Existing generative audio models are largely limited to mono or stereo formats and cannot capture the full 3D localization cues available in first-order Ambisonics (FOA). Recent FOA models extend text-to-audio generation but remain restricted to static sources. In this work, we introduce SonicMotion, the first end-to-end latent diffusion framework capable of generating FOA audio with explicit control over moving sound sources. SonicMotion is implemented in two variations: 1) a descriptive model conditioned on natural language prompts, and 2) a parametric model conditioned on both text and spatial trajectory parameters for higher precision. To support training and evaluation, we construct a new dataset of over one million simulated FOA caption pairs that include both static and dynamic sources…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNoise Effects and Management · Music and Audio Processing · Vehicle Noise and Vibration Control