FoleyDesigner: Immersive Stereo Foley Generation with Precise Spatio-Temporal Alignment for Film Clips

Mengtian Li; Kunyan Dai; Yi Ding; Ruobing Ni; Ying Zhang; Wenwu Wang; Zhifeng Xie

arXiv:2604.05731·cs.CV·April 8, 2026

FoleyDesigner: Immersive Stereo Foley Generation with Precise Spatio-Temporal Alignment for Film Clips

Mengtian Li, Kunyan Dai, Yi Ding, Ruobing Ni, Ying Zhang, Wenwu Wang, Zhifeng Xie

PDF

1 Repo

TL;DR

FoleyDesigner is a comprehensive framework that automates and enhances immersive stereo Foley sound creation for film, combining video analysis, diffusion models, and professional mixing to improve alignment and flexibility.

Contribution

It introduces a novel multi-agent system with diffusion models and LLM-driven mechanisms, along with the FilmStereo dataset, to improve spatio-temporal Foley generation and integration.

Findings

01

Achieves superior spatio-temporal alignment compared to baselines.

02

Supports professional audio standards like Dolby Atmos and ITU-R BS.775.

03

Provides interactive control and seamless pipeline integration.

Abstract

Foley art plays a pivotal role in enhancing immersive auditory experiences in film, yet manual creation of spatio-temporally aligned audio remains labor-intensive. We propose FoleyDesigner, a novel framework inspired by professional Foley workflows, integrating film clip analysis, spatio-temporally controllable Foley generation, and professional audio mixing capabilities. FoleyDesigner employs a multi-agent architecture for precise spatio-temporal analysis. It achieves spatio-temporal alignment through latent diffusion models trained on spatio-temporal cues extracted from video frames, combined with large language model (LLM)-driven hybrid mechanisms that emulate post-production practices in film industry. To address the lack of high-quality stereo audio datasets in film, we introduce FilmStereo, the first professional stereo audio dataset containing spatial metadata, precise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gekiii996.github.io/FoleyDesigner
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.