TL;DR
This paper introduces a hierarchical two-stream model that generates realistic 3D human motion sequences from complex natural language descriptions, advancing the field of text-based motion synthesis.
Contribution
It presents a novel hierarchical model that effectively maps complex sentences to detailed 3D pose sequences, handling compositional actions and improving synthesis quality.
Findings
Achieved a 50% improvement over previous methods in objective evaluations.
Generated motions closely matched ground-truth data in user studies.
Successfully handled both simple and complex, multi-action sentences.
Abstract
"How can we animate 3D-characters from a movie script or move robots by simply telling them what we would like them to do?" "How unstructured and complex can we make a sentence and still generate plausible movements from it?" These are questions that need to be answered in the long-run, as the field is still in its infancy. Inspired by these problems, we present a new technique for generating compositional actions, which handles complex input sentences. Our output is a 3D pose sequence depicting the actions in the input sentence. We propose a hierarchical two-stream sequential model to explore a finer joint-level mapping between natural language sentences and 3D pose sequences corresponding to the given motion. We learn two manifold representations of the motion -- one each for the upper body and the lower body movements. Our model can generate plausible pose sequences for short…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
