UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation
Tianhao Han,Haoyang Zhang,Liang Xie,Haochen Chang,Kun Gao,Yuan Cheng,Pengfei Ren,Erwei Yin

TL;DR
UST-Hand is a novel self-supervised framework for 3D hand pose estimation that models uncertainty and spatial correlations, achieving state-of-the-art results on challenging datasets.
Contribution
It introduces a probabilistic point cloud feature space and a conditional normalizing flow model to improve robustness and accuracy in self-supervised hand pose estimation.
Findings
Outperforms existing methods by up to 37.8% in MPVPE.
Effectively models uncertainty and spatial correlations.
Demonstrates superior performance on three challenging datasets.
Abstract
Manually annotating accurate 3D hand poses is extremely time-consuming and labor-intensive. Existing self-supervised hand pose estimation methods leverage the discrepancy between input images and rendered outputs, or multi-view consistency constraints, as the driving force to optimize networks and progressively refine pose accuracy. However, these methods are highly susceptible to noisy pseudo-labels and overlook the importance of fully exploiting fine-grained spatial correlations, which undermines the stability of model training. To address these issues, we propose UST-Hand, a self-supervised learning framework that estimates uncertainty distribution of hand pose and constructs a probabilistic point cloud feature space, which enables the complex spatiotemporal relationship modeling. UST-Hand employs a conditional normalizing flow model to capture hand pose distributions and samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
