PCIE_Pose Solution for EgoExo4D Pose and Proficiency Estimation Challenge

Feng Chen; Kanokphan Lertniphonphan; Qiancheng Yan; Xiaohui Fan; Jun Xie; Tao Zhang; Zhepeng Wang

arXiv:2505.24411·cs.CV·June 2, 2025

PCIE_Pose Solution for EgoExo4D Pose and Proficiency Estimation Challenge

Feng Chen, Kanokphan Lertniphonphan, Qiancheng Yan, Xiaohui Fan, Jun Xie, Tao Zhang, Zhepeng Wang

PDF

Open Access

TL;DR

This paper presents novel transformer-based solutions for egocentric hand and body pose estimation, achieving state-of-the-art results and winning championships in CVPR2025 challenges.

Contribution

Introduces HP-ViT+ architecture for hand pose estimation and a multimodal strategy for body pose, advancing egocentric pose estimation methods.

Findings

01

Achieved 8.31 PA-MPJPE in Hand Pose Challenge

02

Achieved 11.25 MPJPE in Body Pose Challenge

03

Top-1 accuracy of 0.53 in Proficiency Estimation

Abstract

This report introduces our team's (PCIE_EgoPose) solutions for the EgoExo4D Pose and Proficiency Estimation Challenges at CVPR2025. Focused on the intricate task of estimating 21 3D hand joints from RGB egocentric videos, which are complicated by subtle movements and frequent occlusions, we developed the Hand Pose Vision Transformer (HP-ViT+). This architecture synergizes a Vision Transformer and a CNN backbone, using weighted fusion to refine the hand pose predictions. For the EgoExo4D Body Pose Challenge, we adopted a multimodal spatio-temporal feature integration strategy to address the complexities of body pose estimation across dynamic contexts. Our methods achieved remarkable performance: 8.31 PA-MPJPE in the Hand Pose Challenge and 11.25 MPJPE in the Body Pose Challenge, securing championship titles in both competitions. We extended our pose estimation solutions to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Mechanisms and Dynamics · Robot Manipulation and Learning · Hand Gesture Recognition Systems

MethodsAttention Is All You Need · Linear Layer · Adam · Dense Connections · Vision Transformer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention