Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu, Kong

TL;DR
This paper introduces an uncertainty-aware Transformer model for predicting 3D egocentric hand trajectories from RGB videos, outperforming existing 2D methods and advancing real-world AR/VR interaction understanding.
Contribution
It proposes the USST model that integrates attention and aleatoric uncertainty within a state-space framework for 3D hand trajectory forecasting from first-person videos.
Findings
USST outperforms existing methods on H2O and EgoPAT3D datasets.
The model effectively predicts both 2D and 3D hand trajectories.
The approach demonstrates high-quality 3D trajectory annotation workflow.
Abstract
Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view. To fulfill this goal, we propose an uncertainty-aware state space Transformer (USST) that takes the merits of the attention mechanism and aleatoric uncertainty within the framework of the classical state-space model. The model can be further enhanced by the velocity constraint and visual prompt tuning (VPT) on large vision transformers. Moreover, we develop an annotation workflow to collect 3D hand trajectories with high quality.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting· youtube
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Anomaly Detection Techniques and Applications · Action Observation and Synchronization
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization
