HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation
Zhoujie Xu

TL;DR
HRPVT introduces a novel high-resolution pyramid module combined with a transformer backbone and a new keypoint representation method, significantly improving medium and small-scale human pose estimation accuracy and efficiency.
Contribution
The paper presents HRPVT, a transformer-based model with a high-resolution pyramid module and a new keypoint representation, advancing pose estimation for small and medium figures.
Findings
Enhanced accuracy on small and medium human pose datasets.
Reduced computational cost compared to traditional heatmap methods.
Effective modeling of long-range dependencies with PVT v2 backbone.
Abstract
Human pose estimation on medium and small scales has long been a significant challenge in this field. Most existing methods focus on restoring high-resolution feature maps by stacking multiple costly deconvolutional layers or by continuously aggregating semantic information from low-resolution feature maps while maintaining high-resolution ones, which can lead to information redundancy. Additionally, due to quantization errors, heatmap-based methods have certain disadvantages in accurately locating keypoints of medium and small-scale human figures. In this paper, we propose HRPVT, which utilizes PVT v2 as the backbone to model long-range dependencies. Building on this, we introduce the High-Resolution Pyramid Module (HRPM), designed to generate higher quality high-resolution representations by incorporating the intrinsic inductive biases of Convolutional Neural Networks (CNNs) into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Gait Recognition and Analysis
MethodsAttention Is All You Need · Softmax · Linear Layer · Layer Normalization · Residual Connection · Absolute Position Encodings · Multi-Head Attention · Spatial-Reduction Attention · Dense Connections · Pyramid Vision Transformer
