Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, Yang Liu

TL;DR
This paper introduces Diff-BGM, a diffusion model that generates video background music by leveraging multi-modal video features and a new dataset, addressing challenges in dataset availability, control, and alignment.
Contribution
The paper presents a novel diffusion-based framework for video background music generation, incorporating multi-modal control signals and a segment-aware alignment mechanism.
Findings
The dataset BGM909 provides detailed annotations for training and evaluation.
Diff-BGM effectively controls music attributes using video features.
Experimental results demonstrate the model's ability to produce high-quality, aligned background music.
Abstract
When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsALIGN
