VirtualConductor: Music-driven Conducting Video Generation System
Delong Chen, Fan Liu, Zewen Li, Feng Xu

TL;DR
VirtualConductor is a system that generates realistic conducting videos synchronized with music from a single user's image, utilizing a large dataset, novel learning models, and 3D rendering techniques.
Contribution
It introduces AMCNet and adversarial-perceptual learning for cross-modal music-motion synthesis and combines 3D rendering with pose transfer for personalized conducting videos.
Findings
Generated videos are synchronized with music.
System can produce diverse and plausible conducting motions.
Enables any user to become a virtual conductor.
Abstract
In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image. First, a large-scale conductor motion dataset is collected and constructed. Then, we propose Audio Motion Correspondence Network (AMCNet) and adversarial-perceptual learning to learn the cross-modal relationship and generate diverse, plausible, music-synchronized motion. Finally, we combine 3D animation rendering and a pose transfer model to synthesize conducting video from a single given user's image. Therefore, any user can become a virtual conductor through the system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Advanced Vision and Imaging
