What is the Best Automated Metric for Text to Motion Generation?

Jordan Voas; Yili Wang; Qixing Huang; and Raymond Mooney

arXiv:2309.10248·cs.CL·September 20, 2023

What is the Best Automated Metric for Text to Motion Generation?

Jordan Voas, Yili Wang, Qixing Huang, and Raymond Mooney

PDF

Open Access

TL;DR

This paper evaluates existing automated metrics for text-to-motion generation, finds their limitations, and introduces MoBERT, a new metric that aligns closely with human judgments at both sample and model levels.

Contribution

The paper systematically assesses current metrics' correlation with human judgments and proposes MoBERT, a novel multimodal BERT-based metric with superior alignment.

Findings

01

Current metrics poorly correlate with human judgments at sample level.

02

Common metrics like R-Precision correlate well at the model level.

03

MoBERT outperforms existing metrics in aligning with human evaluations.

Abstract

There is growing interest in generating skeleton-based human motions from natural language descriptions. While most efforts have focused on developing better neural architectures for this task, there has been no significant work on determining the proper evaluation metric. Human evaluation is the ultimate accuracy measure for this task, and automated metrics should correlate well with human quality judgments. Since descriptions are compatible with many motions, determining the right metric is critical for evaluating and designing effective generative models. This paper systematically studies which metrics best align with human evaluations and proposes new metrics that align even better. Our findings indicate that none of the metrics currently used for this task show even a moderate correlation with human judgments on a sample level. However, for assessing average model performance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications

MethodsNone · ALIGN