Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training
Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi, Tian, Chang Wen Chen

TL;DR
This paper introduces a zero-shot, offline open-vocabulary text-to-motion generation method that uses prompt learning, a novel text-pose alignment model, and a wordless training mechanism to synthesize motions from text without paired data.
Contribution
It proposes a new framework combining prompt learning, a text-pose alignment model, and wordless training for open-vocabulary motion synthesis without paired training data.
Findings
Significant improvement over baseline methods.
Effective zero-shot text-to-motion generation.
Novel text-pose alignment and wordless training mechanisms.
Abstract
Text-to-motion generation is an emerging and challenging problem, which aims to synthesize motion with the same semantics as the input text. However, due to the lack of diverse labeled training data, most approaches either limit to specific types of text annotations or require online optimizations to cater to the texts during inference at the cost of efficiency and stability. In this paper, we investigate offline open-vocabulary text-to-motion generation in a zero-shot learning manner that neither requires paired training data nor extra online optimization to adapt for unseen texts. Inspired by the prompt learning in NLP, we pretrain a motion generator that learns to reconstruct the full motion from the masked motion. During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsContrastive Language-Image Pre-training
