Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations
Cheol-hwan Yoo, Seo-won Ji, Yong-goo Shin, Seung-wook Kim, and, Sung-jea Ko

TL;DR
This paper introduces a hierarchically-structured convolutional recurrent neural network (HCRNN) that efficiently estimates 3D hand poses from depth images, leveraging hand articulation structure for improved accuracy and speed.
Contribution
The paper presents a novel HCRNN architecture that models hand articulation explicitly, achieving high accuracy and real-time performance without complex data conversions.
Findings
Outperforms most 2D CNN-based methods on public datasets.
Achieves competitive results with state-of-the-art 3D CNN methods.
Runs at 285 fps on a single GPU.
Abstract
3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A hand, which is an articulated object, is composed of six local parts: the palm and five independent fingers. Each finger consists of sequential-joints that provide constrained motion, referred to as a kinematic chain. In this paper, we propose a hierarchically-structured convolutional recurrent neural network (HCRNN) with six branches that estimate the 3D position of the palm and five fingers independently. The palm position is predicted via fully-connected layers. Each sequential-joint, i.e.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution
