VidTune: Creating Video Soundtracks with Generative Music and Contextual Thumbnails
Mina Huh, C. Ailie Fraser, Dingzeyu Li, Mira Dontcheva, Bryan Wang

TL;DR
VidTune is a system that helps video creators generate and review diverse, contextually grounded music soundtracks and thumbnails efficiently, enhancing the creative process with natural language refinement.
Contribution
It introduces a novel interface that generates diverse music options with contextual thumbnails and supports natural language editing, improving soundtrack creation workflows.
Findings
Participants found VidTune helpful for quick review and comparison.
The system enabled more playful and enriching soundtrack creation.
Users appreciated the contextual grounding of thumbnails and music diversity.
Abstract
Music shapes the tone of videos, yet creators often struggle to find soundtracks that match their video's mood and narrative. Recent text-to-music models let creators generate music from text prompts, but our formative study (N=8) shows creators struggle to construct diverse prompts, quickly review and compare tracks, and understand their impact on the video. We present VidTune, a system that supports soundtrack creation by generating diverse music options from a creator's prompt and producing contextual thumbnails for rapid review. VidTune extracts representative video subjects to ground thumbnails in context, maps each track's valence and energy onto visual cues like color and brightness, and depicts prominent genres and instruments. Creators can refine tracks through natural language edits, which VidTune expands into new generations. In a controlled user study (N=12) and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Human-Technology Interaction · Music Technology and Sound Studies · Artificial Intelligence in Games
