Encoder-Free Human Motion Understanding via Structured Motion Descriptions

Yao Zhang; Zhuchenyang Liu; Thomas Ploetz; Yu Xiao

arXiv:2604.21668·cs.CV·April 24, 2026

Encoder-Free Human Motion Understanding via Structured Motion Descriptions

Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao

PDF

2 Repos 1 Models 1 Datasets

TL;DR

This paper introduces Structured Motion Description (SMD), a rule-based text representation of human motion that enables large language models to perform motion understanding tasks without specialized encoders, surpassing state-of-the-art results.

Contribution

The paper presents SMD, a deterministic, rule-based method converting motion data into natural language descriptions, eliminating the need for learned cross-modal alignment and encoder modules.

Findings

01

SMD achieves 66.7% on BABEL-QA and 90.1% on HuMMan-QA in motion question answering.

02

SMD attains R@1 of 0.584 and CIDEr of 53.16 on HumanML3D for motion captioning.

03

The approach generalizes across different LLMs with minimal adaptation.

Abstract

The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-language alignment through dedicated encoders that project motion features into the LLM's embedding space, remaining constrained by cross-modal representation and alignment. Inspired by biomechanical analysis, where joint angles and body-part kinematics have long served as a precise descriptive language for human movement, we propose \textbf{Structured Motion Description (SMD)}, a rule-based, deterministic approach that converts joint position sequences into structured natural language descriptions of joint angles, body part movements, and global trajectory. By representing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
zyyy12138/motion-smd-lora
model· ♡ 1
♡ 1

Datasets

zyyy12138/motion-smd-data
dataset· 1.9k dl
1.9k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.