TL;DR
This paper introduces a neural network model that generates 3-D violinist body movements from music audio, utilizing an encoder-decoder with self-attention and beat tracking to improve realism and timing accuracy.
Contribution
The work is the first to generate 3-D violinist movements from music, integrating self-attention, beat tracking, and a refining network for enhanced movement realism.
Findings
Model outperforms previous methods in objective evaluations.
Subjective assessments favor the proposed approach.
First to generate 3-D violinist movements considering key musical features.
Abstract
This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To facilitate the optimization of self-attention model, beat tracking is applied to determine effective sizes and boundaries of the training examples. The decoder is accompanied with a refining network and a bowing attack inference mechanism to emphasize the right-hand behavior and bowing attack timing. Both objective and subjective evaluations reveal that the proposed model outperforms the state-of-the-art methods. To the best of our knowledge, this work represents the first attempt to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
