TL;DR
This paper introduces a novel method that predicts full-body skeleton movements from audio of violin or piano performances, enabling realistic avatar animations that mimic musicians' hand and arm motions.
Contribution
It is the first to demonstrate that natural body dynamics can be predicted from music audio using an LSTM network trained on online recital videos.
Findings
Successfully predicts arm and finger movements from audio.
Creates animated avatars that mimic musical performances.
First to show body dynamics can be inferred from music.
Abstract
We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it's not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
