Fast Monocular Hand Pose Estimation on Embedded Systems
Shan An, Xiajie Zhang, Dong Wei, Haogang Zhu, Jianyu Yang, and, Konstantinos A. Tsintotas

TL;DR
This paper introduces 'FastHand', a lightweight and efficient hand pose estimation framework suitable for embedded systems, achieving high accuracy and real-time performance on devices like NVIDIA Jetson TX2.
Contribution
The paper presents a novel lightweight encoder-decoder network architecture for fast and accurate monocular hand pose estimation on embedded devices.
Findings
Achieves 25 frames per second on NVIDIA Jetson TX2
Outperforms state-of-the-art methods in accuracy
Demonstrates effectiveness on public datasets
Abstract
Hand pose estimation is a fundamental task in many human-robot interaction-related applications. However, previous approaches suffer from unsatisfying hand landmark predictions in real-world scenes and high computation burden. This paper proposes a fast and accurate framework for hand pose estimation, dubbed as "FastHand". Using a lightweight encoder-decoder network architecture, FastHand fulfills the requirements of practical applications running on embedded devices. The encoder consists of deep layers with a small number of parameters, while the decoder makes use of spatial location information to obtain more accurate results. The evaluation took place on two publicly available datasets demonstrating the improved performance of the proposed pipeline compared to other state-of-the-art approaches. FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications
