PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

Feng Chen; Ling Ding; Kanokphan Lertniphonphan; Jian Li; Kaer Huang,; and Zhepeng Wang

arXiv:2406.12219·cs.CV·June 19, 2024

PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

Feng Chen, Ling Ding, Kanokphan Lertniphonphan, Jian Li, Kaer Huang,, and Zhepeng Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces the PCIE_EgoHandPose solution utilizing a novel Vision Transformer architecture to accurately estimate 3D hand poses from egocentric RGB videos, achieving top performance in the CVPR2024 challenge.

Contribution

The paper presents the Hand Pose Vision Transformer (HP-ViT), a new transformer-based model specifically designed for 3D hand pose estimation from egocentric videos, and demonstrates its effectiveness in a competitive challenge.

Findings

01

Achieved 1st place in the CVPR2024 EgoExo4D Hand Pose Challenge.

02

Attained 25.51 MPJPE and 8.49 PA-MPJPE scores.

03

Proposed a transformer-based architecture for hand pose estimation.

Abstract

This report presents our team's 'PCIE_EgoHandPose' solution for the EgoExo4D Hand Pose Challenge at CVPR2024. The main goal of the challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image provided for the task. This task is particularly challenging due to the subtle movements and occlusions. To handle the complexity of the task, we propose the Hand Pose Vision Transformer (HP-ViT). The HP-ViT comprises a ViT backbone and transformer head to estimate joint positions in 3D, utilizing MPJPE and RLE loss function. Our approach achieved the 1st position in the Hand Pose challenge with 25.51 MPJPE and 8.49 PA-MPJPE. Code is available at https://github.com/KanokphanL/PCIE_EgoHandPose

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kanokphanl/pcie_egohandpose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotic Mechanisms and Dynamics · Robotic Path Planning Algorithms

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer