Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing

Sukhyun Jeong; Yong-Hoon Choi

arXiv:2512.22464·cs.CV·December 30, 2025

Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing

Sukhyun Jeong, Yong-Hoon Choi

PDF

Open Access

TL;DR

This paper introduces PGR$^2$M, a hybrid pose-guided residual refinement method for text-to-motion generation and editing that enhances fidelity and control by combining interpretable pose codes with residuals learned via residual vector quantization.

Contribution

The paper proposes a novel hybrid representation combining pose codes and residuals for improved interpretability and detail in text-to-motion tasks, along with a pose-guided residual tokenizer and refinement framework.

Findings

01

PGR$^2$M outperforms baselines in Fréchet distance and reconstruction metrics.

02

It enables more accurate and detailed motion generation and editing.

03

User studies confirm its intuitive and structure-preserving editing capabilities.

Abstract

Text-based 3D motion generation aims to automatically synthesize diverse motions from natural-language descriptions to extend user creativity, whereas motion editing modifies an existing motion sequence in response to text while preserving its overall structure. Pose-code-based frameworks such as CoMo map quantifiable pose attributes into discrete pose codes that support interpretable motion control, but their frame-wise representation struggles to capture subtle temporal dynamics and high-frequency details, often degrading reconstruction fidelity and local controllability. To address this limitation, we introduce pose-guided residual refinement for motion (PGR $^{2}$ M), a hybrid representation that augments interpretable pose codes with residual codes learned via residual vector quantization (RVQ). A pose-guided RVQ tokenizer decomposes motion into pose latents that encode coarse global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Human Pose and Action Recognition