Optimizing Local-Global Dependencies for Accurate 3D Human Pose   Estimation

Guangsheng Xu; Guoyi Zhang; Lejia Ye; Shuwei Gan; Xiaohu Zhang; and; Xia Yang

arXiv:2412.19676·cs.CV·December 30, 2024

Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation

Guangsheng Xu, Guoyi Zhang, Lejia Ye, Shuwei Gan, Xiaohu Zhang, and, Xia Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SSR-STF, a dual-stream transformer model that combines local and global features for improved 3D human pose estimation, achieving state-of-the-art results on benchmark datasets.

Contribution

The paper proposes SSRFormer with SSRA mechanism to effectively integrate local and global dependencies, enhancing pose estimation accuracy.

Findings

01

Achieves P1 errors of 37.4 mm on Human3.6M

02

Outperforms existing methods in accuracy and generalization

03

Effective in downstream tasks like human mesh recovery

Abstract

Transformer-based methods have recently achieved significant success in 3D human pose estimation, owing to their strong ability to model long-range dependencies. However, relying solely on the global attention mechanism is insufficient for capturing the fine-grained local details, which are crucial for accurate pose estimation. To address this, we propose SSR-STF, a dual-stream model that effectively integrates local features with global dependencies to enhance 3D human pose estimation. Specifically, we introduce SSRFormer, a simple yet effective module that employs the skeleton selective refine attention (SSRA) mechanism to capture fine-grained local dependencies in human pose sequences, complementing the global dependencies modeled by the Transformer. By adaptively fusing these two feature streams, SSR-STF can better learn the underlying structure of human poses, overcoming the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

poker-xu/ssr-stf
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · Dropout · Softmax · Dense Connections · Residual Connection · Multi-Head Attention · Adam