ImmersiveFlow: Stereo-to-7.1.4 spatial audio generation with flow matching
Zining Liang, Runbang Wang, Xuzhou Ye, Qiuqiang Kong

TL;DR
ImmersiveFlow is a novel end-to-end generative framework that synthesizes high-resolution 7.1.4 spatial audio directly from stereo inputs, overcoming limitations of existing low-dimensional methods.
Contribution
It introduces Flow Matching within a VAE latent space to generate detailed 7.1.4 spatial audio from stereo, a first in the field.
Findings
Produces perceptually rich sound fields
Enhances externalization compared to traditional methods
Outperforms existing upmixing techniques
Abstract
Immersive spatial audio has become increasingly critical for applications ranging from AR/VR to home entertainment and automotive sound systems. However, existing generative methods remain constrained to low-dimensional formats such as binaural audio and First-Order Ambisonics (FOA). Binaural rendering is inherently limited to headphone playback, while FOA suffers from spatial aliasing and insufficient resolution for high-frequency. To overcome these limitations, we introduce ImmersiveFlow, the first end-to-end generative framework that directly synthesizes discrete 7.1.4 format spatial audio from stereo input. ImmersiveFlow leverages Flow Matching to learn trajectories from stereo inputs to multichannel spatial features within a pretrained VAE latent space. At inference, the Flow Matching model predicted latent features are decoded by the VAE and converted into the final 7.1.4…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Hearing Loss and Rehabilitation · Generative Adversarial Networks and Image Synthesis
