OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Dixuan Lin; Yuxiang Zhang; Mengcheng Li; Wei Jing; Qi Yan; Qianying Wang; Yebin Liu; Hongwen Zhang

arXiv:2405.20330·cs.CV·April 15, 2026

OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Dixuan Lin, Yuxiang Zhang, Mengcheng Li, Wei Jing, Qi Yan, Qianying Wang, Yebin Liu, Hongwen Zhang

PDF

1 Repo

TL;DR

OmniHands introduces a versatile transformer-based method for robust 4D hand mesh recovery from monocular and multi-view inputs, effectively modeling hand interactions and relative movements.

Contribution

It presents a universal architecture with novel tokenization and feature fusion strategies to handle various hand input types and relative hand positions.

Findings

01

Achieves superior performance on benchmark datasets.

02

Effectively reconstructs intricate hand interactions in real-world scenarios.

03

Handles both single-hand and two-hand inputs with explicit relative positioning.

Abstract

In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://OmniHand.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.