Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration

Siyi Xie; Hanxin Zhu; Xinyi Chen; Tianyu He; Xin Li; Zhibo Chen

arXiv:2506.15759·cs.SD·March 2, 2026

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration

Siyi Xie, Hanxin Zhu, Xinyi Chen, Tianyu He, Xin Li, Zhibo Chen

PDF

Open Access 1 Video

TL;DR

Sonic4D introduces a framework that synthesizes spatial audio aligned with 4D scene visualizations, significantly enhancing immersive audiovisual experiences by localizing sound sources and generating realistic spatial audio in a training-free manner.

Contribution

The paper presents a novel, training-free method for generating spatial audio synchronized with 4D scenes, integrating visual grounding and physics-based audio synthesis.

Findings

01

Produces realistic spatial audio consistent with 4D scenes

02

Enhances immersion in audiovisual exploration

03

Operates without additional training data

Abstract

Recent advancements in 4D generation have demonstrated its remarkable capability in synthesizing photorealistic renderings of dynamic 3D scenes. However, despite achieving impressive visual performance, almost all existing methods overlook the generation of spatial audio aligned with the corresponding 4D scenes, posing a significant limitation to truly immersive audiovisual experiences. To mitigate this issue, we propose Sonic4D, a novel framework that enables spatial audio generation for immersive exploration of 4D scenes. Specifically, our method is composed of three stages: 1) To capture both the dynamic visual content and raw auditory information from a monocular video, we first employ pre-trained expert models to generate the 4D scene and its corresponding monaural audio. 2) Subsequently, to transform the monaural audio into spatial audio, we localize and track the sound sources…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration· underline

Taxonomy

TopicsMusic Technology and Sound Studies · Hearing Loss and Rehabilitation · Generative Adversarial Networks and Image Synthesis