Transformer-based Global 3D Hand Pose Estimation in Two Hands Manipulating Objects Scenarios
Hoseong Cho, Donguk Kim, Chanwoo Kim, Seongyeong Lee, Seungryul, Baek

TL;DR
This paper presents a transformer-based approach for accurate 3D hand pose estimation in egocentric images involving two interacting hands and objects, achieving top performance in the ECCV 2022 challenge.
Contribution
It introduces an end-to-end multi-hand pose estimation method using transformers and a novel scale-aware depth estimation algorithm for diverse hand sizes.
Findings
Achieved 14.4 mm and 15.9 mm errors for left and right hands.
Performed robustly in scenarios with interacting hands and objects.
Won 1st place in the ECCV 2022 challenge on HBHA.
Abstract
This report describes our 1st place solution to ECCV 2022 challenge on Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras (hand pose estimation). In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint. Our proposed method performs end-to-end multi-hand pose estimation via transformer architecture. In particular, our method robustly estimates hand poses in a scenario where two hands interact. Additionally, we propose an algorithm that considers hand scales to robustly estimate the absolute depth. The proposed algorithm works well even when the hand sizes are various for each person. Our method attains 14.4 mm (left) and 15.9 mm (right) errors for each hand in the test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Hand Gesture Recognition Systems
MethodsTest
