# Bi-stream Pose Guided Region Ensemble Network for Fingertip Localization   from Stereo Images

**Authors:** Guijin Wang, Cairong Zhang, Xinghao Chen, Xiangyang Ji, Jing-Hao Xue,, Hang Wang

arXiv: 1902.09795 · 2019-02-27

## TL;DR

This paper introduces a new large-scale stereo hand pose dataset and a novel neural network method for accurate fingertip localization, outperforming existing approaches on the new dataset.

## Contribution

It provides the THU-Bi-Hand dataset with extensive annotations and proposes the Bi-Pose-REN network for improved fingertip localization from stereo images.

## Key findings

- Bi-Pose-REN achieves state-of-the-art performance on THU-Bi-Hand.
- The dataset covers diverse hand shapes and viewpoints with 447k stereo image pairs.
- The method effectively utilizes pose-guided feature regions for precise localization.

## Abstract

In human-computer interaction, it is important to accurately estimate the hand pose especially fingertips. However, traditional approaches for fingertip localization mainly rely on depth images and thus suffer considerably from the noise and missing values. Instead of depth images, stereo images can also provide 3D information of hands and promote 3D hand pose estimation. There are nevertheless limitations on the dataset size, global viewpoints, hand articulations and hand shapes in the publicly available stereo-based hand pose datasets. To mitigate these limitations and promote further research on hand pose estimation from stereo images, we propose a new large-scale binocular hand pose dataset called THU-Bi-Hand, offering a new perspective for fingertip localization. In the THU-Bi-Hand dataset, there are 447k pairs of stereo images of different hand shapes from 10 subjects with accurate 3D location annotations of the wrist and five fingertips. Captured with minimal restriction on the range of hand motion, the dataset covers large global viewpoint space and hand articulation space. To better present the performance of fingertip localization on THU-Bi-Hand, we propose a novel scheme termed Bi-stream Pose Guided Region Ensemble Network (Bi-Pose-REN). It extracts more representative feature regions around joint points in the feature maps under the guidance of the previously estimated pose. The feature regions are integrated hierarchically according to the topology of hand joints to regress the refined hand pose. Bi-Pose-REN and several existing methods are evaluated on THU-Bi-Hand so that benchmarks are provided for further research. Experimental results show that our new method has achieved the best performance on THU-Bi-Hand.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.09795/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/1902.09795/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1902.09795/full.md

---
Source: https://tomesphere.com/paper/1902.09795