Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries

Serkan Sulun; Paula Viana; Matthew E. P. Davies

arXiv:2502.10154·cs.SD·February 6, 2026

Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries

Serkan Sulun, Paula Viana, Matthew E. P. Davies

PDF

Open Access

TL;DR

This paper presents EMSYNC, an automated system that generates emotionally and temporally aligned music for videos by leveraging a two-stage framework with novel boundary offset mechanisms and emotion mapping schemes.

Contribution

It introduces a new temporal conditioning mechanism and an emotion mapping scheme for improved video soundtrack generation, outperforming existing models.

Findings

01

Outperforms state-of-the-art models in objective evaluations.

02

Achieves better emotional and temporal alignment in generated music.

03

Demonstrates effectiveness across multiple video datasets.

Abstract

Providing soundtracks for videos remains a costly and time-consuming challenge for multimedia content creators. We introduce EMSYNC, an automatic video-based symbolic music generator that creates music aligned with a video's emotional content and temporal boundaries. It follows a two-stage framework, where a pretrained video emotion classifier extracts emotional features, and a conditional music generator produces MIDI sequences guided by both emotional and temporal cues. We introduce boundary offsets, a novel temporal conditioning mechanism that enables the model to anticipate upcoming video scene cuts and align generated musical chords with them. We also propose a mapping scheme that bridges the discrete categorical outputs of the video emotion classifier with the continuous valence-arousal inputs required by the emotion-conditioned MIDI generator, enabling seamless integration of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies

MethodsALIGN