Every Image Listens, Every Image Dances: Music-Driven Image Animation

Zhikang Dong; Weituo Hao; Ju-Chiang Wang; Peng Zhang; Pawel Polak

arXiv:2501.18801·cs.CV·February 3, 2025

Every Image Listens, Every Image Dances: Music-Driven Image Animation

Zhikang Dong, Weituo Hao, Ju-Chiang Wang, Peng Zhang, Pawel Polak

PDF

Open Access

TL;DR

MuseDance is a novel end-to-end model that animates images using music and text, enabling personalized, synchronized dance videos without complex motion guidance, and introduces a new multimodal dance dataset.

Contribution

The paper presents MuseDance, a new diffusion-based model for music and text-driven image animation, and provides a comprehensive multimodal dance dataset for research.

Findings

01

MuseDance achieves synchronized and personalized dance animations.

02

The model generalizes well across diverse images and music.

03

The dataset supports future research in multimodal dance video generation.

Abstract

Image animation has become a promising area in multimodal research, with a focus on generating videos from reference images. While prior work has largely emphasized generic video generation guided by text, music-driven dance video generation remains underexplored. In this paper, we introduce MuseDance, an innovative end-to-end model that animates reference images using both music and text inputs. This dual input enables MuseDance to generate personalized videos that follow text descriptions and synchronize character movements with the music. Unlike existing approaches, MuseDance eliminates the need for complex motion guidance inputs, such as pose or depth sequences, making flexible and creative video generation accessible to users of all expertise levels. To advance research in this field, we present a new multimodal dataset comprising 2,904 dance videos with corresponding background…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCinema and Media Studies

MethodsFocus