Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation

Zan Wang; Jingze Zhang; Yixin Chen; Baoxiong Jia; Wei Liang; Siyuan Huang

arXiv:2508.08991·cs.CV·August 13, 2025

Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation

Zan Wang, Jingze Zhang, Yixin Chen, Baoxiong Jia, Wei Liang, Siyuan Huang

PDF

TL;DR

This paper introduces MSQ, a multi-scale quantization method for human motion generation that captures complex patterns and offers flexible, compositional control, outperforming existing methods on benchmarks.

Contribution

The paper proposes a novel multi-scale quantization approach for motion representation, enabling flexible composition and improved generation quality.

Findings

01

MSQ effectively captures multi-scale motion features.

02

The method supports motion editing and control.

03

Outperforms baseline methods on multiple benchmarks.

Abstract

Despite significant advancements in human motion generation, current motion representations, typically formulated as discrete frame sequences, still face two critical limitations: (i) they fail to capture motion from a multi-scale perspective, limiting the capability in complex patterns modeling; (ii) they lack compositional flexibility, which is crucial for model's generalization in diverse generation tasks. To address these challenges, we introduce MSQ, a novel quantization method that compresses the motion sequence into multi-scale discrete tokens across spatial and temporal dimensions. MSQ employs distinct encoders to capture body parts at varying spatial granularities and temporally interpolates the encoded features into multiple scales before quantizing them into discrete tokens. Building on this representation, we establish a generative mask modeling model to effectively support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.