Harmonizing Pixels and Melodies: Maestro-Guided Film Score Generation and Composition Style Transfer
F. Qi, L. Ni, C. Xu

TL;DR
This paper presents a novel diffusion-based framework for generating film scores from videos, capable of aligning music with visual themes and specific styles, and introduces new evaluation metrics and datasets for this task.
Contribution
It introduces a new film score generation method using latent diffusion and a film encoder, with a streamlined tuning mechanism and a novel evaluation metric, advancing automated film scoring.
Findings
Outperforms existing methods in film score generation
Capable of style-specific music synthesis from videos
Introduces a new dataset and evaluation metric for film scores
Abstract
We introduce a film score generation framework to harmonize visual pixels and music melodies utilizing a latent diffusion model. Our framework processes film clips as input and generates music that aligns with a general theme while offering the capability to tailor outputs to a specific composition style. Our model directly produces music from video, utilizing a streamlined and efficient tuning mechanism on ControlNet. It also integrates a film encoder adept at understanding the film's semantic depth, emotional impact, and aesthetic appeal. Additionally, we introduce a novel, effective yet straightforward evaluation metric to evaluate the originality and recognizability of music within film scores. To fill this gap for film scores, we curate a comprehensive dataset of film videos and legendary original scores, injecting domain-specific knowledge into our data-driven generation model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
