Video Background Music Generation: Dataset, Method and Evaluation
Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley, Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

TL;DR
This paper introduces a comprehensive framework for automatic video background music generation, including a new dataset with rich annotations, a benchmark model utilizing music and video features, and an evaluation metric for assessing video-music correspondence.
Contribution
It provides the first richly annotated video-music dataset, a benchmark generation model, and a retrieval-based metric for evaluating video-background music alignment.
Findings
V-MusProd outperforms existing methods in music quality and video correspondence.
The SymMV dataset enables better training and evaluation of video music generation models.
The VMCP metric effectively measures video-music semantic alignment.
Abstract
Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a complete recipe including dataset, benchmark model, and evaluation metric for video background music generation. We present SymMV, a video and symbolic music dataset with various musical annotations. To the best of our knowledge, it is the first video-music dataset with rich musical annotations. We also propose a benchmark video background music generation framework named V-MusProd, which utilizes music priors of chords, melody, and accompaniment along with video-music relations of semantic, color,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Human Motion and Animation
MethodsNone
