TL;DR
This paper introduces CMT, a controllable music transformer that generates video background music with rhythmic and genre control, ensuring better synchronization and quality compared to previous methods.
Contribution
The work presents a novel controllable music transformer that aligns video and music rhythms and allows genre and instrument control, improving video background music generation.
Findings
Generated music shows high compatibility with input videos.
Music quality is rated as impressive in evaluations.
Rhythmic relations between video and music are effectively modeled.
Abstract
In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music tailored to a particular video, and none of them considers the video-music rhythmic consistency. To generate the background music that matches the given video, we first establish the rhythmic relations between video and background music. In particular, we connect timing, motion speed, and motion saliency from video with beat, simu-note density, and simu-note strength from music, respectively. We then propose CMT, a Controllable Music Transformer that enables local control of the aforementioned rhythmic features and global control of the music genre and instruments. Objective and subjective evaluations show that the generated background music has achieved satisfactory compatibility with the input videos, and at the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
