HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen, Zhuheng Song, Xiaoke Jiang, Yaoqing Hu, Junzhi Yu, Lei, Zhang

TL;DR
HandOS introduces an end-to-end one-stage framework for 3D hand reconstruction that integrates detection, 2D and 3D keypoint estimation, reducing computation and errors compared to multi-stage methods.
Contribution
The paper presents a novel end-to-end framework that combines hand detection, 2D pose estimation, and 3D mesh reconstruction in a single stage, eliminating the need for multi-stage processing.
Findings
Achieves state-of-the-art performance on FreiHand with 5.0 PA-MPJPE.
Attains 64.6% [email protected] on HInt-Ego4D benchmark.
Reduces computational redundancy and error accumulation.
Abstract
Existing approaches of hand reconstruction predominantly adhere to a multi-stage framework, encompassing detection, left-right classification, and pose estimation. This paradigm induces redundant computation and cumulative errors. In this work, we propose HandOS, an end-to-end framework for 3D hand reconstruction. Our central motivation lies in leveraging a frozen detector as the foundation while incorporating auxiliary modules for 2D and 3D keypoint estimation. In this manner, we integrate the pose estimation capacity into the detection framework, while at the same time obviating the necessity of using the left-right category as a prerequisite. Specifically, we propose an interactive 2D-3D decoder, where 2D joint semantics is derived from detection cues while 3D representation is lifted from those of 2D joints. Furthermore, hierarchical attention is designed to enable the concurrent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnatomy and Medical Technology
MethodsSoftmax · Attention Is All You Need
