UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

Tianhao Han,Haoyang Zhang,Liang Xie,Haochen Chang,Kun Gao,Yuan Cheng,Pengfei Ren,Erwei Yin

arXiv:2605.17742·cs.CV·May 19, 2026

UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

Tianhao Han,Haoyang Zhang,Liang Xie,Haochen Chang,Kun Gao,Yuan Cheng,Pengfei Ren,Erwei Yin

PDF

TL;DR

UST-Hand is a novel self-supervised framework for 3D hand pose estimation that models uncertainty and spatial correlations, achieving state-of-the-art results on challenging datasets.

Contribution

It introduces a probabilistic point cloud feature space and a conditional normalizing flow model to improve robustness and accuracy in self-supervised hand pose estimation.

Findings

01

Outperforms existing methods by up to 37.8% in MPVPE.

02

Effectively models uncertainty and spatial correlations.

03

Demonstrates superior performance on three challenging datasets.

Abstract

Manually annotating accurate 3D hand poses is extremely time-consuming and labor-intensive. Existing self-supervised hand pose estimation methods leverage the discrepancy between input images and rendered outputs, or multi-view consistency constraints, as the driving force to optimize networks and progressively refine pose accuracy. However, these methods are highly susceptible to noisy pseudo-labels and overlook the importance of fully exploiting fine-grained spatial correlations, which undermines the stability of model training. To address these issues, we propose UST-Hand, a self-supervised learning framework that estimates uncertainty distribution of hand pose and constructs a probabilistic point cloud feature space, which enables the complex spatiotemporal relationship modeling. UST-Hand employs a conditional normalizing flow model to capture hand pose distributions and samples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.