How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control

Kunhang Li; Jason Naradowsky; Yansong Feng; Yusuke Miyao

arXiv:2505.21531·cs.CV·September 23, 2025

How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control

Kunhang Li, Jason Naradowsky, Yansong Feng, Yusuke Miyao

PDF

Open Access 1 Video

TL;DR

This study investigates the extent of large language models' understanding of human motion by using them to generate 3D avatar animations from instructions, revealing strengths in high-level planning and weaknesses in detailed positioning.

Contribution

The paper introduces a novel framework for evaluating LLMs' knowledge of human motion through a two-step planning process for 3D avatar control, highlighting their capabilities and limitations.

Findings

01

LLMs excel at high-level movement interpretation

02

Struggle with precise body part positioning

03

Decomposition improves motion planning

Abstract

We explore the human motion knowledge of Large Language Models (LLMs) through 3D avatar control. Given a motion instruction, we prompt LLMs to first generate a high-level movement plan with consecutive steps (High-level Planning), then specify body part positions in each step (Low-level Planning), which we linearly interpolate into avatar animations. Using 20 representative motion instructions that cover fundamental movements and balance body part usage, we conduct comprehensive evaluations, including human and automatic scoring of both high-level movement plans and generated animations, as well as automatic comparison with oracle positions in low-level planning. Our findings show that LLMs are strong at interpreting high-level body movements but struggle with precise body part positioning. While decomposing motion queries into atomic components improves planning, LLMs face challenges…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How Much Do Large Language Models Know about Human Motion? A Case Study in 3D Avatar Control· underline

Taxonomy

TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Action Observation and Synchronization