Skeletor: Skeletal Transformers for Robust Body-Pose Estimation
Tao Jiang, Necati Cihan Camgoz, Richard Bowden

TL;DR
Skeletor introduces a transformer-based model that learns spatio-temporal context to improve 3D human pose estimation from videos, addressing issues like noise, jitter, and occlusion in skeletal tracking.
Contribution
The paper presents Skeletor, a novel unsupervised transformer network that models pose and motion distribution to enhance skeletal accuracy and robustness in 3D human pose estimation.
Findings
Improves accuracy of 3D pose estimation in challenging conditions.
Reduces jitter and noise in skeletal sequences.
Enhances downstream tasks like sign language translation.
Abstract
Predicting 3D human pose from a single monoscopic video can be highly challenging due to factors such as low resolution, motion blur and occlusion, in addition to the fundamental ambiguity in estimating 3D from 2D. Approaches that directly regress the 3D pose from independent images can be particularly susceptible to these factors and result in jitter, noise and/or inconsistencies in skeletal estimation. Much of which can be overcome if the temporal evolution of the scene and skeleton are taken into account. However, rather than tracking body parts and trying to temporally smooth them, we propose a novel transformer based network that can learn a distribution over both pose and motion in an unsupervised fashion. We call our approach Skeletor. Skeletor overcomes inaccuracies in detection and corrects partial or entire skeleton corruption. Skeletor uses strong priors learn from on 25…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
