Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images
Jinwei Ren, and Jianke Zhu

TL;DR
This paper introduces a novel end-to-end framework that combines RGB and depth data using a pyramid deep fusion network to accurately reconstruct dense 3D hand meshes, outperforming existing methods.
Contribution
The paper proposes a new multi-scale feature fusion strategy (PDFNet) for RGB-D hand reconstruction, effectively integrating geometric and visual information.
Findings
Outperforms state-of-the-art methods on public datasets.
Effective multi-scale feature fusion improves reconstruction accuracy.
Preserves geometric details by encoding depth as point clouds.
Abstract
Accurately recovering the dense 3D mesh of both hands from monocular images poses considerable challenges due to occlusions and projection ambiguity. Most of the existing methods extract features from color images to estimate the root-aligned hand meshes, which neglect the crucial depth and scale information in the real world. Given the noisy sensor measurements with limited resolution, depth-based methods predict 3D keypoints rather than a dense mesh. These limitations motivate us to take advantage of these two complementary inputs to acquire dense hand meshes on a real-world scale. In this work, we propose an end-to-end framework for recovering dense meshes for both hands, which employ single-view RGB-D image pairs as input. The primary challenge lies in effectively utilizing two different input modalities to mitigate the blurring effects in RGB images and noises in depth images.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Advanced X-ray and CT Imaging
