# Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via   Neural Rendering

**Authors:** Seungryul Baek, Kwang In Kim, Tae-Kyun Kim

arXiv: 1904.04196 · 2019-04-10

## TL;DR

This paper introduces a neural rendering-based framework for dense 3D hand pose estimation from RGB images, combining model fitting, iterative refinement, and self-data augmentation to achieve state-of-the-art accuracy.

## Contribution

It proposes a novel neural rendering approach with iterative refinement and self-data augmentation for improved 3D hand pose and shape estimation from RGB images.

## Key findings

- Achieves state-of-the-art accuracy on three RGB-based benchmarks.
- Effectively recovers dense 3D hand shapes and articulations.
- Each technical component significantly improves estimation accuracy.

## Abstract

Estimating 3D hand meshes from single RGB images is challenging, due to intrinsic 2D-3D mapping ambiguities and limited training data. We adopt a compact parametric 3D hand model that represents deformable and articulated hand meshes. To achieve the model fitting to RGB images, we investigate and contribute in three ways: 1) Neural rendering: inspired by recent work on human body, our hand mesh estimator (HME) is implemented by a neural network and a differentiable renderer, supervised by 2D segmentation masks and 3D skeletons. HME demonstrates good performance for estimating diverse hand shapes and improves pose estimation accuracies. 2) Iterative testing refinement: Our fitting function is differentiable. We iteratively refine the initial estimate using the gradients, in the spirit of iterative model fitting methods like ICP. The idea is supported by the latest research on human body. 3) Self-data augmentation: collecting sized RGB-mesh (or segmentation mask)-skeleton triplets for training is a big hurdle. Once the model is successfully fitted to input RGB images, its meshes i.e. shapes and articulations, are realistic, and we augment view-points on top of estimated dense hand poses. Experiments using three RGB-based benchmarks show that our framework offers beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand shapes. Each technical component above meaningfully improves the accuracy in the ablation study.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.04196/full.md

## Figures

60 figures with captions in the complete paper: https://tomesphere.com/paper/1904.04196/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/1904.04196/full.md

---
Source: https://tomesphere.com/paper/1904.04196