Chatting about Upper-Body Expressive Human Pose and Shape Estimation
Yuxiang Zhao, Wei Huang, Yujie Song, Liu Wang, Huan Zhao

TL;DR
This paper introduces CoEvoer, a transformer-based framework for upper-body expressive human pose and shape estimation, improving accuracy and generalization by modeling interactions among face, hands, and torso.
Contribution
The paper presents CoEvoer, the first framework specifically designed for upper-body EHPS that leverages cross-dependency transformers for joint parameter estimation.
Findings
Achieves state-of-the-art performance on upper-body benchmarks.
Demonstrates strong generalization to unseen wild images.
Enables explicit feature interaction across body parts for improved accuracy.
Abstract
Expressive Human Pose and Shape Estimation (EHPS) plays a crucial role in various AR/VR applications and has witnessed significant progress in recent years. However, current state-of-the-art methods still struggle with accurate parameter estimation for facial and hand regions and exhibit limited generalization to wild images. To address these challenges, we present CoEvoer, a novel one-stage synergistic cross-dependency transformer framework tailored for upper-body EHPS. CoEvoer enables explicit feature-level interaction across different body parts, allowing for mutual enhancement through contextual information exchange. Specifically, larger and more easily estimated regions such as the torso provide global semantics and positional priors to guide the estimation of finer, more complex regions like the face and hands. Conversely, the localized details captured in facial and hand regions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
