Text-driven Human Motion Generation with Motion Masked Diffusion Model
Xingyu Chen

TL;DR
This paper introduces the Motion Masked Diffusion Model (MMDM), a novel approach that enhances text-driven human motion generation by explicitly learning spatio-temporal relationships through masking strategies, resulting in improved motion quality and consistency.
Contribution
The paper proposes a new motion masked mechanism for diffusion models, with two masking strategies, to better learn spatio-temporal relations in human motion generation.
Findings
Effective balance of motion quality and text-motion consistency.
Improved FID scores on HumanML3D and KIT-ML datasets.
Enhanced ability to learn spatio-temporal relationships.
Abstract
Text-driven human motion generation is a multimodal task that synthesizes human motion sequences conditioned on natural language. It requires the model to satisfy textual descriptions under varying conditional inputs, while generating plausible and realistic human actions with high diversity. Existing diffusion model-based approaches have outstanding performance in the diversity and multimodality of generation. However, compared to autoregressive methods that train motion encoders before inference, diffusion methods lack in fitting the distribution of human motion features which leads to an unsatisfactory FID score. One insight is that the diffusion model lack the ability to learn the motion relations among spatio-temporal semantics through contextual reasoning. To solve this issue, in this paper, we proposed Motion Masked Diffusion Model \textbf{(MMDM)}, a novel human motion masked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Hand Gesture Recognition Systems
MethodsDiffusion
