UmeTrack: Unified multi-view end-to-end hand tracking for VR
Shangchen Han, Po-chen Wu, Yubo Zhang, Beibei Liu, Linguang Zhang,, Zheng Wang, Weiguang Si, Peizhao Zhang, Yujun Cai, Tomas Hodan, Randi, Cabezas, Luan Tran, Muzaffer Akbay, Tsz-Ho Yu, Cem Keskin, Robert Wang

TL;DR
UmeTrack introduces a unified, end-to-end differentiable framework for real-time multi-view 3D hand tracking in VR, directly predicting world-space hand poses and enhancing VR interaction accuracy.
Contribution
The paper presents a novel end-to-end multi-view hand tracking model that predicts 3D hand poses in world space, addressing limitations of previous methods and including a new large-scale egocentric dataset.
Findings
System effectively handles challenging interactive motions.
Successfully applied to real-time VR applications.
Outperforms existing methods in accuracy and robustness.
Abstract
Real-time tracking of 3D hand pose in world space is a challenging problem and plays an important role in VR interaction. Existing work in this space are limited to either producing root-relative (versus world space) 3D pose or rely on multiple stages such as generating heatmaps and kinematic optimization to obtain 3D pose. Moreover, the typical VR scenario, which involves multi-view tracking from wide \ac{fov} cameras is seldom addressed by these methods. In this paper, we present a unified end-to-end differentiable framework for multi-view, multi-frame hand tracking that directly predicts 3D hand pose in world space. We demonstrate the benefits of end-to-end differentiabilty by extending our framework with downstream tasks such as jitter reduction and pinch prediction. To demonstrate the efficacy of our model, we further present a new large-scale egocentric hand pose dataset that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
