TL;DR
TransPose introduces a Transformer-based model for human pose estimation that captures long-range dependencies, improves localization accuracy, and provides interpretability, outperforming CNN-based methods on COCO and MPII benchmarks.
Contribution
The paper presents TransPose, a novel Transformer-based architecture for pose estimation that captures spatial dependencies and offers interpretability, outperforming CNN models in accuracy and efficiency.
Findings
Achieves 75.8 AP on COCO validation set.
Outperforms CNN architectures in speed and accuracy.
Transfers well to MPII benchmark with minimal training.
Abstract
While CNN-based models have made remarkable progress on human pose estimation, what spatial dependencies they capture to localize keypoints remains unclear. In this work, we propose a model called \textbf{TransPose}, which introduces Transformer for human pose estimation. The attention layers built in Transformer enable our model to capture long-range relationships efficiently and also can reveal what dependencies the predicted keypoints rely on. To predict keypoint heatmaps, the last attention layer acts as an aggregator, which collects contributions from image clues and forms maximum positions of keypoints. Such a heatmap-based localization approach via Transformer conforms to the principle of Activation Maximization~\cite{erhan2009visualizing}. And the revealed dependencies are image-specific and fine-grained, which also can provide evidence of how the model handles special cases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Attention Is All You Need · Dropout · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding
