SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual   Data

Liqian Zhang; Magdalena Fuentes

arXiv:2410.03879·cs.SD·February 27, 2025

SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data

Liqian Zhang, Magdalena Fuentes

PDF

Open Access 1 Repo

TL;DR

SONIQUE is a novel model that generates customizable background music for videos using unpaired audio-visual data, large language models, and diffusion techniques, enabling flexible and user-controlled music creation.

Contribution

It introduces a new approach that leverages unpaired data and LLMs for video understanding to generate tailored music, unlike traditional paired dataset methods.

Findings

01

Enables user control over music attributes like instruments and genre

02

Uses unpaired data to train a video-to-music generation model

03

Open-source implementation with a demo available

Abstract

We present SONIQUE, a model for generating background music tailored to video content. Unlike traditional video-to-music generation approaches, which rely heavily on paired audio-visual datasets, SONIQUE leverages unpaired data, combining royalty-free music and independent video sources. By utilizing large language models (LLMs) for video understanding and converting visual descriptions into musical tags, alongside a U-Net-based conditional diffusion model, SONIQUE enables customizable music generation. Users can control specific aspects of the music, such as instruments, genres, tempo, and melodies, ensuring the generated output fits their creative vision. SONIQUE is open-source, with a demo available online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zxxwxyyy/sonique
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies