Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation
Guangsheng Xu, Guoyi Zhang, Lejia Ye, Shuwei Gan, Xiaohu Zhang, and, Xia Yang

TL;DR
This paper introduces SSR-STF, a dual-stream transformer model that combines local and global features for improved 3D human pose estimation, achieving state-of-the-art results on benchmark datasets.
Contribution
The paper proposes SSRFormer with SSRA mechanism to effectively integrate local and global dependencies, enhancing pose estimation accuracy.
Findings
Achieves P1 errors of 37.4 mm on Human3.6M
Outperforms existing methods in accuracy and generalization
Effective in downstream tasks like human mesh recovery
Abstract
Transformer-based methods have recently achieved significant success in 3D human pose estimation, owing to their strong ability to model long-range dependencies. However, relying solely on the global attention mechanism is insufficient for capturing the fine-grained local details, which are crucial for accurate pose estimation. To address this, we propose SSR-STF, a dual-stream model that effectively integrates local features with global dependencies to enhance 3D human pose estimation. Specifically, we introduce SSRFormer, a simple yet effective module that employs the skeleton selective refine attention (SSRA) mechanism to capture fine-grained local dependencies in human pose sequences, complementing the global dependencies modeled by the Transformer. By adaptively fusing these two feature streams, SSR-STF can better learn the underlying structure of human poses, overcoming the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Video Surveillance and Tracking Methods
MethodsAttention Is All You Need · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · Dropout · Softmax · Dense Connections · Residual Connection · Multi-Head Attention · Adam
