Making Pose Representations More Expressive and Disentangled via Residual Vector Quantization

Sukhyun Jeong; Hong-Gi Shin; Yong-Hoon Choi

arXiv:2508.14561·cs.CV·August 21, 2025

Making Pose Representations More Expressive and Disentangled via Residual Vector Quantization

Sukhyun Jeong, Hong-Gi Shin, Yong-Hoon Choi

PDF

Open Access

TL;DR

This paper introduces a novel residual vector quantization approach to enhance pose code representations, making them more expressive and disentangled for improved controllable 3D human motion generation.

Contribution

We propose augmenting pose code-based latent representations with continuous features using RVQ, balancing interpretability with capturing subtle motion details.

Findings

01

Reduced FID from 0.041 to 0.015 on HumanML3D

02

Improved Top-1 R-Precision from 0.508 to 0.510

03

Enhanced controllability for motion editing

Abstract

Recent progress in text-to-motion has advanced both 3D human motion generation and text-based motion control. Controllable motion generation (CoMo), which enables intuitive control, typically relies on pose code representations, but discrete pose codes alone cannot capture fine-grained motion details, limiting expressiveness. To overcome this, we propose a method that augments pose code-based latent representations with continuous motion features using residual vector quantization (RVQ). This design preserves the interpretability and manipulability of pose codes while effectively capturing subtle motion characteristics such as high-frequency details. Experiments on the HumanML3D dataset show that our model reduces Frechet inception distance (FID) from 0.041 to 0.015 and improves Top-1 R-Precision from 0.508 to 0.510. Qualitative analysis of pairwise direction similarity between pose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Robot Manipulation and Learning