Multi-View Video-Based 3D Hand Pose Estimation
Leyla Khaleghi, Alireza Sepas Moghaddam, Joshua Marshall, Ali Etemad

TL;DR
This paper introduces MuViHand, a large multi-view video dataset with ground-truth 3D hand poses, and MuViHandNet, a neural network pipeline that leverages multi-view and temporal data for improved 3D hand pose estimation.
Contribution
The paper presents a new multi-view video dataset with synthetic data and complex scenarios, along with a novel neural network architecture that effectively utilizes multi-view and temporal information for 3D hand pose estimation.
Findings
MuViHand dataset contains over 402,000 synthetic images from 6 angles.
MuViHandNet outperforms baseline methods on the new dataset.
Temporal and multi-view information significantly improve estimation accuracy.
Abstract
Hand pose estimation (HPE) can be used for a variety of human-computer interaction applications such as gesture-based control for physical or virtual/augmented reality devices. Recent works have shown that videos or multi-view images carry rich information regarding the hand, allowing for the development of more robust HPE systems. In this paper, we present the Multi-View Video-Based 3D Hand (MuViHand) dataset, consisting of multi-view videos of the hand along with ground-truth 3D pose labels. Our dataset includes more than 402,000 synthetic hand images available in 4,560 videos. The videos have been simultaneously captured from six different angles with complex backgrounds and random levels of dynamic lighting. The data has been captured from 10 distinct animated subjects using 12 cameras in a semi-circle topology where six tracking cameras only focus on the hand and the other six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Stroke Rehabilitation and Recovery
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · Concatenated Skip Connection · U-Net
