Encoder-Free Human Motion Understanding via Structured Motion Descriptions
Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao

TL;DR
This paper introduces Structured Motion Description (SMD), a rule-based text representation of human motion that enables large language models to perform motion understanding tasks without specialized encoders, surpassing state-of-the-art results.
Contribution
The paper presents SMD, a deterministic, rule-based method converting motion data into natural language descriptions, eliminating the need for learned cross-modal alignment and encoder modules.
Findings
SMD achieves 66.7% on BABEL-QA and 90.1% on HuMMan-QA in motion question answering.
SMD attains R@1 of 0.584 and CIDEr of 53.16 on HumanML3D for motion captioning.
The approach generalizes across different LLMs with minimal adaptation.
Abstract
The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-language alignment through dedicated encoders that project motion features into the LLM's embedding space, remaining constrained by cross-modal representation and alignment. Inspired by biomechanical analysis, where joint angles and body-part kinematics have long served as a precise descriptive language for human movement, we propose \textbf{Structured Motion Description (SMD)}, a rule-based, deterministic approach that converts joint position sequences into structured natural language descriptions of joint angles, body part movements, and global trajectory. By representing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
