TL;DR
OmniHands introduces a versatile transformer-based method for robust 4D hand mesh recovery from monocular and multi-view inputs, effectively modeling hand interactions and relative movements.
Contribution
It presents a universal architecture with novel tokenization and feature fusion strategies to handle various hand input types and relative hand positions.
Findings
Achieves superior performance on benchmark datasets.
Effectively reconstructs intricate hand interactions in real-world scenarios.
Handles both single-hand and two-hand inputs with explicit relative positioning.
Abstract
In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
