Aligning Human Motion Generation with Human Perceptions
Haoru Wang, Wentao Zhu, Luyi Miao, Yishu Xu, Feng Gao, Qi Tian, Yizhou, Wang

TL;DR
This paper introduces MotionPercept and MotionCritic, a data-driven framework that aligns human motion generation with human perception, improving realism assessment and generation quality.
Contribution
It presents a large-scale perceptual dataset and a critic model that better reflect human preferences, advancing evaluation and generation of human motions.
Findings
MotionCritic correlates well with human judgments.
The approach improves motion realism in generated outputs.
Evaluation metrics align more closely with human perception.
Abstract
Human motion generation is a critical task with a wide range of applications. Achieving high realism in generated motions requires naturalness, smoothness, and plausibility. Despite rapid advancements in the field, current generation methods often fall short of these goals. Furthermore, existing evaluation metrics typically rely on ground-truth-based errors, simple heuristics, or distribution distances, which do not align well with human perceptions of motion quality. In this work, we propose a data-driven approach to bridge this gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic, that capture human perceptual preferences. Our critic model offers a more accurate metric for assessing motion quality and could be readily integrated into the motion generation pipeline to enhance generation quality. Extensive…
Peer Reviews
Decision·ICLR 2025 Poster
(1) The use of a large-scale human-annotated dataset and a critic model for automatic evaluation to align motion generation quality with human perceptions is a significant step forward. (2) The experimental design is comprehensive. It evaluates the MotionCritic model on different data distributions and shows its generalization ability. Also, it tests the model as a training supervision signal and analyzes its impact on motion generation quality.
1. Table 2 shows, after 700 steps finetuning, the MotionCrotic decrease at 800 step. I am not sure about the trends, could you provide results with much much more steps to let us see the potential of designed Critic Supervision. 2. I think motion model trained on larger datasets HumanML3D and Motion-X is should be used to see Critic Supervision's potential. 3. Sec 4.3 could be with more detail and Algorithm 1 could be much simpler and clearer maintaining only necessary procedures.
This paper addresses a common problem in generative motion synthesis: how to evaluate the motion quality of a generated sequence? In contrast to the often utilized FID score MotionCrititic operates on a per-sequence base, which allows for a much more granular evaluation.
One major concern of the motion critic is its quality ceiling: MotionPercept is highly dependent on the generative models that produces the motion, so in turn MotionCritic is highly dependent on those generative models as well. That means that, for the model, the best possible motion is the best generative model of MotionPerecpt, potentially limiting the usefulness of the critic once generative models produce significantly better motion than has been produced for MotionPercept. Could the authors
The paper addresses a fundamental problem in human motion generation and is both well-written and well-motivated. I appreciate the authors' meticulous effort in their work. Annotating a large sample set and training a critic model on this data is a clever idea. Additionally, the thorough analysis and comparisons effectively demonstrate the proposed method's impact. While the human motion modeling field has converged on a set of complementary metrics for quality assessment, these metrics still f
A sensitivity analysis could be included to assess the robustness and smoothness (e.g., Lipschitz continuity) of the critic model. For instance, how would the critic model respond if a pose is slightly perturbed, either randomly or in a structured way by rotating a joint beyond its limits? By design, certain artifacts—such as foot-floor penetration, person-ground contact, etc.—might be missed, as the critic model lacks information needed to detect these issues (e.g., it only uses rotations and
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · 3D Shape Modeling and Analysis
MethodsALIGN
