Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation
Sebastian Lutz, Richard Blythman, Koustav Ghosal, Matthew, Moynihan, Ciaran Simms, Aljosa Smolic

TL;DR
This paper introduces Jointformer, a transformer-based model for 3D human pose estimation from single images, utilizing self-attention, error prediction, and refinement to outperform existing methods.
Contribution
The paper presents a novel transformer architecture with error prediction and refinement for improved 3D pose estimation from monocular images.
Findings
Outperforms recent state-of-the-art methods significantly
Uses intermediate supervision and residual connections to enhance performance
Error prediction as multi-task learning improves accuracy
Abstract
Monocular 3D human pose estimation technologies have the potential to greatly increase the availability of human movement data. The best-performing models for single-image 2D-3D lifting use graph convolutional networks (GCNs) that typically require some manual input to define the relationships between different body joints. We propose a novel transformer-based approach that uses the more generalised self-attention mechanism to learn these relationships within a sequence of tokens representing joints. We find that the use of intermediate supervision, as well as residual connections between the stacked encoders benefits performance. We also suggest that using error prediction as part of a multi-task learning framework improves performance by allowing the network to compensate for its confidence level. We perform extensive ablation studies to show that each of our contributions increases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Stroke Rehabilitation and Recovery
