Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors
Amit Moryossef

TL;DR
This paper improves hand region detection in MediaPipe Holistic by using a data-driven approach with enriched features, leading to more accurate ROI estimation and better sign language recognition performance.
Contribution
It introduces a novel data-driven method that enhances hand ROI prediction by incorporating additional keypoints and depth information.
Findings
Higher Intersection-over-Union scores for hand ROI estimation
Improved sign language recognition accuracy
Enhanced robustness to non-ideal hand orientations
Abstract
This paper addresses a critical flaw in MediaPipe Holistic's hand Region of Interest (ROI) prediction, which struggles with non-ideal hand orientations, affecting sign language recognition accuracy. We propose a data-driven approach to enhance ROI estimation, leveraging an enriched feature set including additional hand keypoints and the z-dimension. Our results demonstrate better estimates, with higher Intersection-over-Union compared to the current method. Our code and optimizations are available at https://github.com/sign-language-processing/mediapipe-hand-crop-fix.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Stroke Rehabilitation and Recovery · Virtual Reality Applications and Impacts
MethodsSparse Evolutionary Training
