Multi-Resolution Generative Modeling of Human Motion from Limited Data
David Eduardo Moreno-Villamar\'in, Anna Hilsmann, Peter Eisert

TL;DR
This paper introduces a multi-resolution generative model for human motion synthesis that effectively learns from limited data, enabling diverse, synchronized motion and gesture generation across multiple temporal scales.
Contribution
The work presents a novel multi-scale architecture with specialized networks for generating human motions and co-speech gestures from limited data, avoiding test-time fitting.
Findings
Achieves diverse motion synthesis with limited training data
Successfully generates synchronized co-speech gestures
Extends to direct SMPL pose parameter synthesis
Abstract
We present a generative model that learns to synthesize human motion from limited training sequences. Our framework provides conditional generation and blending across multiple temporal resolutions. The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture. Our model contains a set of generative and adversarial networks, along with embedding modules, each tailored for generating motions at specific frame rates while exerting control over their content and details. Notably, our approach also extends to the synthesis of co-speech gestures, demonstrating its ability to generate synchronized gestures from speech inputs, even with limited paired data. Through direct synthesis of SMPL pose parameters, our approach avoids test-time adjustments to fit human body meshes. Experimental results showcase our model's ability to achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training · Convolution
