Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network   for Capturing Hand Articulations

Cheol-hwan Yoo; Seo-won Ji; Yong-goo Shin; Seung-wook Kim; and; Sung-jea Ko

arXiv:1911.07424·cs.CV·August 28, 2020

Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations

Cheol-hwan Yoo, Seo-won Ji, Yong-goo Shin, Seung-wook Kim, and, Sung-jea Ko

PDF

TL;DR

This paper introduces a hierarchically-structured convolutional recurrent neural network (HCRNN) that efficiently estimates 3D hand poses from depth images, leveraging hand articulation structure for improved accuracy and speed.

Contribution

The paper presents a novel HCRNN architecture that models hand articulation explicitly, achieving high accuracy and real-time performance without complex data conversions.

Findings

01

Outperforms most 2D CNN-based methods on public datasets.

02

Achieves competitive results with state-of-the-art 3D CNN methods.

03

Runs at 285 fps on a single GPU.

Abstract

3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A hand, which is an articulated object, is composed of six local parts: the palm and five independent fingers. Each finger consists of sequential-joints that provide constrained motion, referred to as a kinematic chain. In this paper, we propose a hierarchically-structured convolutional recurrent neural network (HCRNN) with six branches that estimate the 3D position of the palm and five fingers independently. The palm position is predicted via fully-connected layers. Each sequential-joint, i.e.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution