Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing
Sukhyun Jeong, Yong-Hoon Choi

TL;DR
This paper introduces PGR$^2$M, a hybrid pose-guided residual refinement method for text-to-motion generation and editing that enhances fidelity and control by combining interpretable pose codes with residuals learned via residual vector quantization.
Contribution
The paper proposes a novel hybrid representation combining pose codes and residuals for improved interpretability and detail in text-to-motion tasks, along with a pose-guided residual tokenizer and refinement framework.
Findings
PGR$^2$M outperforms baselines in Fréchet distance and reconstruction metrics.
It enables more accurate and detailed motion generation and editing.
User studies confirm its intuitive and structure-preserving editing capabilities.
Abstract
Text-based 3D motion generation aims to automatically synthesize diverse motions from natural-language descriptions to extend user creativity, whereas motion editing modifies an existing motion sequence in response to text while preserving its overall structure. Pose-code-based frameworks such as CoMo map quantifiable pose attributes into discrete pose codes that support interpretable motion control, but their frame-wise representation struggles to capture subtle temporal dynamics and high-frequency details, often degrading reconstruction fidelity and local controllability. To address this limitation, we introduce pose-guided residual refinement for motion (PGRM), a hybrid representation that augments interpretable pose codes with residual codes learned via residual vector quantization (RVQ). A pose-guided RVQ tokenizer decomposes motion into pose latents that encode coarse global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
