PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling
Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, Yebin Liu

TL;DR
PoseVocab introduces a joint-structured pose embedding method that captures high-fidelity human appearance details for realistic avatar modeling, enabling better generalization and animation under novel poses.
Contribution
The paper proposes a novel joint-structured pose encoding method with feature lines and hierarchical interpolation for improved human avatar synthesis.
Findings
Outperforms state-of-the-art methods in synthesis quality
Achieves better generalization to unseen poses
Enhances detail preservation in dynamic human appearances
Abstract
Creating pose-driven human avatars is about modeling the mapping from the low-frequency driving pose to high-frequency dynamic human appearances, so an effective pose encoding method that can encode high-fidelity human details is essential to human avatar modeling. To this end, we present PoseVocab, a novel pose encoding method that encourages the network to discover the optimal pose embeddings for learning the dynamic human appearance. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. To achieve pose generalization and temporal consistency, we sample key rotations in of each joint rather than the global pose vectors, and assign a pose embedding to each sampled key rotation. These joint-structured pose embeddings not only encode the dynamic appearances under different key poses, but also factorize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
