Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images
Yiming Wu, Wei Ji, Xi Li, Gang Wang, Jianwei Yin, Fei Wu

TL;DR
This paper introduces CADSTN, a novel deep network that models spatial and temporal features from depth image sequences for accurate, real-time hand pose estimation, outperforming existing methods on benchmark datasets.
Contribution
The paper presents a new spatio-temporal network with adaptive fusion for hand pose estimation from depth images, achieving state-of-the-art accuracy and real-time performance.
Findings
Achieves top performance on benchmark datasets.
Runs at 60 frames per second.
Effectively models spatial and temporal information.
Abstract
As a fundamental and challenging problem in computer vision, hand pose estimation aims to estimate the hand joint locations from depth images. Typically, the problem is modeled as learning a mapping function from images to hand joint coordinates in a data-driven manner. In this paper, we propose Context-Aware Deep Spatio-Temporal Network (CADSTN), a novel method to jointly model the spatio-temporal properties for hand pose estimation. Our proposed network is able to learn the representations of the spatial information and the temporal structure from the image sequences. Moreover, by adopting adaptive fusion method, the model is capable of dynamically weighting different predictions to lay emphasis on sufficient context. Our method is examined on two common benchmarks, the experimental results demonstrate that our proposed approach achieves the best or the second-best performance with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
