FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang,, Zhizheng Wu, Kai Chen

TL;DR
FoleyCrafter is a novel framework that automatically generates high-quality, synchronized, and semantically relevant sounds for silent videos, enhancing immersive audio-visual experiences with controllable text prompts.
Contribution
It introduces a dual-component system combining semantic alignment and precise synchronization, leveraging pre-trained text-to-audio models for improved video-to-sound generation.
Findings
Achieves high-quality, synchronized audio for videos
Enables controllable generation via text prompts
Outperforms existing methods on standard benchmarks
Abstract
We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations, we propose FoleyCrafter, a novel framework that leverages a pre-trained text-to-audio model to ensure high-quality audio generation. FoleyCrafter comprises two key components: the semantic adapter for semantic alignment and the temporal controller for precise audio-video synchronization. The semantic adapter utilizes parallel cross-attention layers to condition audio generation on video features, producing realistic sound effects that are semantically relevant to the visual content.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Video Analysis and Summarization · Speech and Audio Processing
MethodsAdapter
