3D Hand Pose and Shape Estimation from RGB Images for Keypoint-Based Hand Gesture Recognition
Danilo Avola, Luigi Cinque, Alessio Fagioli, Gian Luca Foresti,, Adriano Fragomeni, Daniele Pannone

TL;DR
This paper introduces a robust end-to-end framework for 3D hand pose and shape estimation from RGB images, significantly improving stability and accuracy for hand gesture recognition tasks.
Contribution
The paper presents a novel keypoint-based end-to-end system that enhances 3D hand estimation stability and accuracy using multi-task learning and a viewpoint encoder.
Findings
Achieved state-of-the-art results on 3D hand pose and shape estimation benchmarks.
Outperformed existing keypoint-based methods in hand gesture recognition datasets.
Demonstrated robustness and stability in real-life scenarios.
Abstract
Estimating the 3D pose of a hand from a 2D image is a well-studied problem and a requirement for several real-life applications such as virtual reality, augmented reality, and hand gesture recognition. Currently, reasonable estimations can be computed from single RGB images, especially when a multi-task learning approach is used to force the system to consider the shape of the hand when its pose is determined. However, depending on the method used to represent the hand, the performance can drop considerably in real-life tasks, suggesting that stable descriptions are required to achieve satisfactory results. In this paper, we present a keypoint-based end-to-end framework for 3D hand and pose estimation and successfully apply it to the task of hand gesture recognition as a study case. Specifically, after a pre-processing step in which the images are normalized, the proposed pipeline uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
