Scriboora: Rethinking Human Pose Forecasting
Daniel Bermuth, Alexander Poeppel, Wolfgang Reif

TL;DR
This paper introduces a unified pipeline for human pose forecasting, adapts speech models for improved accuracy, and evaluates robustness against real-world noisy data, highlighting reproducibility issues and solutions.
Contribution
It provides a comprehensive evaluation framework, adapts speech understanding models to pose forecasting, and assesses model robustness with noisy data and unsupervised fine-tuning.
Findings
Speech models improve pose forecasting accuracy.
Reproducibility issues are prevalent in existing algorithms.
Unsupervised fine-tuning recovers performance on noisy data.
Abstract
Human pose forecasting predicts future poses based on past observations, and has many significant applications in areas such as action recognition, autonomous driving or human-robot interaction. This paper evaluates a wide range of pose forecasting algorithms in the task of absolute pose forecasting, revealing many reproducibility issues, and provides a unified training and evaluation pipeline. After drawing a high-level analogy to the task of speech understanding, it is shown that recent speech models can be efficiently adapted to the task of pose forecasting, and improve current state-of-the-art performance. Finally, the robustness of the models is evaluated, using noisy joint coordinates obtained from a pose estimation model, to reflect a realistic type of noise, which is closer to real-world applications. For this a new dataset variation is introduced, and it is shown that estimated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Hand Gesture Recognition Systems
