ImmersiveFlow: Stereo-to-7.1.4 spatial audio generation with flow matching

Zining Liang; Runbang Wang; Xuzhou Ye; Qiuqiang Kong

arXiv:2601.12950·eess.AS·January 21, 2026

ImmersiveFlow: Stereo-to-7.1.4 spatial audio generation with flow matching

Zining Liang, Runbang Wang, Xuzhou Ye, Qiuqiang Kong

PDF

Open Access

TL;DR

ImmersiveFlow is a novel end-to-end generative framework that synthesizes high-resolution 7.1.4 spatial audio directly from stereo inputs, overcoming limitations of existing low-dimensional methods.

Contribution

It introduces Flow Matching within a VAE latent space to generate detailed 7.1.4 spatial audio from stereo, a first in the field.

Findings

01

Produces perceptually rich sound fields

02

Enhances externalization compared to traditional methods

03

Outperforms existing upmixing techniques

Abstract

Immersive spatial audio has become increasingly critical for applications ranging from AR/VR to home entertainment and automotive sound systems. However, existing generative methods remain constrained to low-dimensional formats such as binaural audio and First-Order Ambisonics (FOA). Binaural rendering is inherently limited to headphone playback, while FOA suffers from spatial aliasing and insufficient resolution for high-frequency. To overcome these limitations, we introduce ImmersiveFlow, the first end-to-end generative framework that directly synthesizes discrete 7.1.4 format spatial audio from stereo input. ImmersiveFlow leverages Flow Matching to learn trajectories from stereo inputs to multichannel spatial features within a pretrained VAE latent space. At inference, the Flow Matching model predicted latent features are decoded by the VAE and converted into the final 7.1.4…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Hearing Loss and Rehabilitation · Generative Adversarial Networks and Image Synthesis