FoleySpace: Vision-Aligned Binaural Spatial Audio Generation
Lei Zhao, Rujin Chen, Chi Zhang, Xiao-Lei Zhang, Xuelong Li

TL;DR
FoleySpace is a novel framework that converts video into immersive binaural spatial audio by estimating sound source positions and employing a diffusion model, significantly improving spatial perception and immersion.
Contribution
It introduces a visual-guided binaural audio generation method with a new sound source estimation and coordinate mapping, advancing the realism of video-to-audio synthesis.
Findings
Outperforms existing methods in spatial perception accuracy
Enhances immersive quality of audio-visual experiences
Supports dynamic sound field generation with a new dataset
Abstract
Recently, with the advancement of AIGC, deep learning-based video-to-audio (V2A) technology has garnered significant attention. However, existing research mostly focuses on mono audio generation that lacks spatial perception, while the exploration of binaural spatial audio generation technologies, which can provide a stronger sense of immersion, remains insufficient. To solve this problem, we propose FoleySpace, a framework for video-to-binaural audio generation that produces immersive and spatially consistent stereo sound guided by visual information. Specifically, we develop a sound source estimation method to determine the sound source 2D coordinates and depth in each video frame, and then employ a coordinate mapping mechanism to convert the 2D source positions into a 3D trajectory. This 3D trajectory, together with the monaural audio generated by a pre-trained V2A model, serves as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Loss and Rehabilitation · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
