Improving 3D Pose Estimation for Sign Language
Maksym Ivashechkin, Oscar Mendez, Richard Bowden

TL;DR
This paper introduces a fast, accurate 3D human pose estimation method combining neural networks with Forward Kinematics, outperforming MediaPipe and generalizing across datasets, suitable for sign language applications.
Contribution
The novel integration of FK with neural networks for 3D pose estimation ensures speed and validity, improving accuracy over existing methods.
Findings
Outperforms MediaPipe in accuracy
Runs at 100-200 ms per image on CPU
Generalizes well across datasets
Abstract
This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using neural networks to predict both the joint rotations and bone lengths. These predictions are then combined with skeletal constraints using an FK layer implemented as a network layer in PyTorch. The result is a fast and accurate approach to the estimation of 3D skeletal pose. Through quantitative and qualitative evaluation, we demonstrate the method is significantly more accurate than MediaPipe in terms of both per joint positional error and visual appearance. Furthermore, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Gait Recognition and Analysis
