GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models

Zhankai Ye; Bofan Li; Yukai Jin; Shuoqiu Li; Wei Wang; Yanfu Zhang; Shangqian Gao; Xin Liu

arXiv:2601.07632·cs.CV·March 20, 2026

GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models

Zhankai Ye, Bofan Li, Yukai Jin, Shuoqiu Li, Wei Wang, Yanfu Zhang, Shangqian Gao, Xin Liu

PDF

Open Access 1 Models

TL;DR

GeoMotionGPT introduces a geometry-aligned framework for motion understanding with large language models, improving reasoning accuracy by explicitly enforcing orthogonality in motion and embedding spaces.

Contribution

The paper proposes a novel orthogonality-enforcing framework that aligns motion codebooks with LLM embeddings, enhancing motion reasoning capabilities.

Findings

01

22.4% improvement on HumanML3D

02

14.4% improvement on KIT-ML

03

Effective geometric alignment enhances motion reasoning

Abstract

Discrete motion tokenization has recently enabled Large Language Models (LLMs) to serve as versatile backbones for motion understanding and motion-language reasoning. However, existing pipelines typically decouple motion quantization from semantic embedding learning, linking them solely via token IDs. This approach fails to effectively align the intrinsic geometry of the motion space with the embedding space, thereby hindering the LLM's capacity for nuanced motion reasoning. We argue that alignment is most effective when both modalities share a unified geometric basis. Therefore, instead of forcing the LLM to reconstruct the complex geometry among motion tokens from scratch, we present a novel framework that explicitly enforces orthogonality on both the motion codebook and the LLM embedding space, ensuring that their relational structures naturally mirror each other. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zy22b/GeoMotionGPT
model· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Human Pose and Action Recognition