Strong and Controllable 3D Motion Generation
Canxuan Gang

TL;DR
This paper introduces a more efficient transformer-based diffusion model and a control mechanism for human motion generation, enabling faster and more precise joint-level control suitable for real-time applications.
Contribution
It proposes a customized attention mechanism and a motion control network to improve efficiency and control in text-to-motion generation.
Findings
Enhanced generation speed with optimized transformer models
Improved joint-level control accuracy
Potential for real-time human motion applications
Abstract
Human motion generation is a significant pursuit in generative computer vision with widespread applications in film-making, video games, AR/VR, and human-robot interaction. Current methods mainly utilize either diffusion-based generative models or autoregressive models for text-to-motion generation. However, they face two significant challenges: (1) The generation process is time-consuming, posing a major obstacle for real-time applications such as gaming, robot manipulation, and other online settings. (2) These methods typically learn a relative motion representation guided by text, making it difficult to generate motion sequences with precise joint-level control. These challenges significantly hinder progress and limit the real-world application of human motion generation techniques. To address this gap, we propose a simple yet effective architecture consisting of two key components.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Motion and Animation · 3D Shape Modeling and Analysis
MethodsDiffusion
