MediaPipe Hands: On-device Real-time Hand Tracking
Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka,, George Sung, Chuo-Ling Chang, Matthias Grundmann

TL;DR
MediaPipe Hands introduces a real-time, on-device hand tracking system using a two-model pipeline for AR/VR, achieving high accuracy and speed on mobile devices.
Contribution
It presents a novel real-time hand tracking pipeline optimized for on-device use, combining a palm detector and landmark model within the MediaPipe framework.
Findings
Real-time inference on mobile GPUs
High prediction accuracy
Open source availability
Abstract
We present a real-time on-device hand tracking pipeline that predicts hand skeleton from single RGB camera for AR/VR applications. The pipeline consists of two models: 1) a palm detector, 2) a hand landmark model. It's implemented via MediaPipe, a framework for building cross-platform ML solutions. The proposed model and pipeline architecture demonstrates real-time inference speed on mobile GPUs and high prediction quality. MediaPipe Hands is open sourced at https://mediapipe.dev.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Gaze Tracking and Assistive Technology
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
