What is the Best Automated Metric for Text to Motion Generation?
Jordan Voas, Yili Wang, Qixing Huang, and Raymond Mooney

TL;DR
This paper evaluates existing automated metrics for text-to-motion generation, finds their limitations, and introduces MoBERT, a new metric that aligns closely with human judgments at both sample and model levels.
Contribution
The paper systematically assesses current metrics' correlation with human judgments and proposes MoBERT, a novel multimodal BERT-based metric with superior alignment.
Findings
Current metrics poorly correlate with human judgments at sample level.
Common metrics like R-Precision correlate well at the model level.
MoBERT outperforms existing metrics in aligning with human evaluations.
Abstract
There is growing interest in generating skeleton-based human motions from natural language descriptions. While most efforts have focused on developing better neural architectures for this task, there has been no significant work on determining the proper evaluation metric. Human evaluation is the ultimate accuracy measure for this task, and automated metrics should correlate well with human quality judgments. Since descriptions are compatible with many motions, determining the right metric is critical for evaluating and designing effective generative models. This paper systematically studies which metrics best align with human evaluations and proposes new metrics that align even better. Our findings indicate that none of the metrics currently used for this task show even a moderate correlation with human judgments on a sample level. However, for assessing average model performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
MethodsNone · ALIGN
