HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks
Jameel Malik, Soshi Shimada, Ahmed Elhayek, Sk Aziz Ali and, Christian Theobalt, Vladislav Golyanik, Didier Stricker

TL;DR
HandVoxNet++ is a voxel-based neural network that improves 3D hand shape and pose estimation from depth maps by combining voxel and surface representations, achieving state-of-the-art results on multiple benchmarks.
Contribution
It introduces a novel voxel-based deep network with 3D and graph convolutions, and a new mesh registration method, enhancing accuracy over previous approaches.
Findings
Achieves state-of-the-art performance on SynHand5M, HANDS19, and HO-3D datasets.
Gains 41.09% and 13.7% higher shape alignment accuracy on key datasets.
Ranks first on the HANDS19 challenge dataset at the time of submission.
Abstract
3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. Existing methods addressing it directly regress hand meshes via 2D convolutional neural networks, which leads to artefacts due to perspective distortions in the images. To address the limitations of the existing methods, we develop HandVoxNet++, i.e., a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. The input to our network is a 3D voxelized-depth-map-based on the truncated signed distance function (TSDF). HandVoxNet++ relies on two hand shape representations. The first one is the 3D voxelized grid of hand shape, which does not preserve the mesh topology and which is the most accurate representation. The second representation is the hand surface that preserves the mesh topology. We combine the advantages of both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
