VirtualConductor: Music-driven Conducting Video Generation System

Delong Chen; Fan Liu; Zewen Li; Feng Xu

arXiv:2108.04350·cs.CV·August 11, 2021·1 cites

VirtualConductor: Music-driven Conducting Video Generation System

Delong Chen, Fan Liu, Zewen Li, Feng Xu

PDF

Open Access 2 Repos

TL;DR

VirtualConductor is a system that generates realistic conducting videos synchronized with music from a single user's image, utilizing a large dataset, novel learning models, and 3D rendering techniques.

Contribution

It introduces AMCNet and adversarial-perceptual learning for cross-modal music-motion synthesis and combines 3D rendering with pose transfer for personalized conducting videos.

Findings

01

Generated videos are synchronized with music.

02

System can produce diverse and plausible conducting motions.

03

Enables any user to become a virtual conductor.

Abstract

In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image. First, a large-scale conductor motion dataset is collected and constructed. Then, we propose Audio Motion Correspondence Network (AMCNet) and adversarial-perceptual learning to learn the cross-modal relationship and generate diverse, plausible, music-synchronized motion. Finally, we combine 3D animation rendering and a pose transfer model to synthesize conducting video from a single given user's image. Therefore, any user can become a virtual conductor through the system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Advanced Vision and Imaging