Poseur: Direct Human Pose Regression with Transformers
Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong, Wang, Zhibin Wang, Anton van den Hengel

TL;DR
Poseur introduces a novel Transformer-based regression method for 2D human pose estimation from single images, outperforming heatmap-based approaches by directly predicting keypoints with an attention mechanism.
Contribution
This work is the first to successfully apply a Transformer-based regression approach to 2D human pose estimation, eliminating the need for heatmaps and improving accuracy.
Findings
Outperforms state-of-the-art regression methods.
Favors heatmap-based methods on key benchmarks.
Demonstrates effectiveness of attention in pose regression.
Abstract
We propose a direct, regression-based approach to 2D human pose estimation from single images. We formulate the problem as a sequence prediction task, which we solve using a Transformer network. This network directly learns a regression mapping from images to the keypoint coordinates, without resorting to intermediate representations such as heatmaps. This approach avoids much of the complexity associated with heatmap-based approaches. To overcome the feature misalignment issues of previous regression-based methods, we propose an attention mechanism that adaptively attends to the features that are most relevant to the target keypoints, considerably improving the accuracy. Importantly, our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Experiments on MS-COCO and MPII, two predominant pose-estimation datasets, demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Diabetic Foot Ulcer Assessment and Management
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Absolute Position Encodings · Byte Pair Encoding · Layer Normalization · Residual Connection
