Robust Photo-Realistic Hand Gesture Generation: from Single View to Multiple View
Qifan Fu, Xu Chen, Muhammad Asad, Shanxin Yuan, Changjae Oh, Gregory Slabaugh

TL;DR
This paper introduces a multi-view prior framework called MUFEN for high-fidelity, photo-realistic hand gesture generation, effectively capturing complete 3D hand features from multiple perspectives to overcome occlusion issues.
Contribution
The paper proposes a novel multi-view prior framework with a dual stream encoder and feature fusion for improved 3D hand gesture generation from single images.
Findings
Achieves state-of-the-art performance in quantitative metrics.
Improves understanding of complete hand features.
Effectively handles occlusion through multi-view priors.
Abstract
High-fidelity hand gesture generation represents a significant challenge in human-centric generation tasks. Existing methods typically employ a single-view mesh-rendered image prior to enhancing gesture generation quality. However, the spatial complexity of hand gestures and the inherent limitations of single-view rendering make it difficult to capture complete gesture information, particularly when fingers are occluded. The fundamental contradiction lies in the loss of 3D topological relationships through 2D projection and the incomplete spatial coverage inherent to single-view representations. Diverging from single-view prior approaches, we propose a multi-view prior framework, named Multi-Modal UNet-based Feature Encoder (MUFEN), to guide diffusion models in learning comprehensive 3D hand information. Specifically, we extend conventional front-view rendering to include rear, left,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
