How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control
Kunhang Li, Jason Naradowsky, Yansong Feng, Yusuke Miyao

TL;DR
This study investigates the extent of large language models' understanding of human motion by using them to generate 3D avatar animations from instructions, revealing strengths in high-level planning and weaknesses in detailed positioning.
Contribution
The paper introduces a novel framework for evaluating LLMs' knowledge of human motion through a two-step planning process for 3D avatar control, highlighting their capabilities and limitations.
Findings
LLMs excel at high-level movement interpretation
Struggle with precise body part positioning
Decomposition improves motion planning
Abstract
We explore the human motion knowledge of Large Language Models (LLMs) through 3D avatar control. Given a motion instruction, we prompt LLMs to first generate a high-level movement plan with consecutive steps (High-level Planning), then specify body part positions in each step (Low-level Planning), which we linearly interpolate into avatar animations. Using 20 representative motion instructions that cover fundamental movements and balance body part usage, we conduct comprehensive evaluations, including human and automatic scoring of both high-level movement plans and generated animations, as well as automatic comparison with oracle positions in low-level planning. Our findings show that LLMs are strong at interpreting high-level body movements but struggle with precise body part positioning. While decomposing motion queries into atomic components improves planning, LLMs face challenges…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Action Observation and Synchronization
