Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In   Video-to-Audio Synthesis

Zhiqi Huang; Dan Luo; Jun Wang; Huan Liao; Zhiheng Li; Zhiyong Wu

arXiv:2409.08628·cs.SD·September 16, 2024

Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis

Zhiqi Huang, Dan Luo, Jun Wang, Huan Liao, Zhiheng Li, Zhiyong Wu

PDF

Open Access

TL;DR

This paper presents Rhythmic Foley, a novel framework for video-to-audio synthesis that enhances synchronization and semantic accuracy using dual adapters and contrastive pre-training.

Contribution

It introduces a dual-adapter framework with semantic and temporal synchronization, improving audio-visual alignment and control in video-to-audio synthesis.

Findings

01

Improved semantic integrity in generated audio.

02

Enhanced beat point synchronization accuracy.

03

Effective control over audio semantics and beat effects.

Abstract

Our research introduces an innovative framework for video-to-audio synthesis, which solves the problems of audio-video desynchronization and semantic loss in the audio. By incorporating a semantic alignment adapter and a temporal synchronization adapter, our method significantly improves semantic integrity and the precision of beat point synchronization, particularly in fast-paced action sequences. Utilizing a contrastive audio-visual pre-trained encoder, our model is trained with video and high-quality audio data, improving the quality of the generated audio. This dual-adapter approach empowers users with enhanced control over audio semantics and beat effects, allowing the adjustment of the controller to achieve better results. Extensive experiments substantiate the effectiveness of our framework in achieving seamless audio-visual alignment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing