Evaluation of Pose Estimation Systems for Sign Language Translation
Catherine O'Brien, Gerard Sant, Mathias M\"uller, Sarah Ebling

TL;DR
This paper systematically compares various pose estimation systems for sign language translation, analyzing their impact on translation quality, robustness, and stability, and provides insights into the best-performing models.
Contribution
It offers a comprehensive evaluation of pose estimators for SLT, highlighting the importance of specific models like Sapiens and SDPose for improved translation accuracy.
Findings
SDPose and Sapiens outperform MediaPipe in BLEU scores (~11.5 vs. 10).
Sapiens correctly handles all occlusion cases tested, unlike OpenPifPaf.
Estimators missing hand keypoints correlate with lower translation quality.
Abstract
Many sign language translation (SLT) systems operate on pose sequences instead of raw video to reduce input dimensionality, improve portability, and partially anonymize signers. The choice of pose estimator is often treated as an implementation detail, with systems defaulting to widely available tools such as MediaPipe Holistic or OpenPose. We present a systematic comparison of pose estimators for pose-based SLT, covering widely used baselines (MediaPipe Holistic, OpenPose) and newer whole-body/high-capacity models (MMPose WholeBody, OpenPifPaf, AlphaPose, SDPose, Sapiens, SMPLest-X). We quantify downstream impact by training a controlled SLT pipeline on RWTH-PHOENIX-Weather 2014 where only the pose representation varies, evaluating with BLEU and BLEURT. To contextualize translation outcomes, we analyze temporal stability, missing hand keypoints, and robustness to occlusion using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
