DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation
Shuaitao Zhao, Kun Liu, Yuhang Huang, Qian Bao, Dan Zeng, and Wu Liu

TL;DR
DPIT introduces a dual-pipeline transformer approach that combines top-down and bottom-up methods for improved human pose estimation, effectively handling occlusion and scale variation by fusing global and local visual cues.
Contribution
This work is the first to integrate top-down and bottom-up pipelines with transformers for human pose estimation, enhancing accuracy and robustness.
Findings
Achieves state-of-the-art performance on COCO and MPII datasets.
Effectively handles occlusion and scale variation issues.
Demonstrates the benefit of combining global and local features via transformers.
Abstract
Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happens. On the other hand, existing bottom-up methods consider all people at the same time and capture the global knowledge of the entire image. However, they are less accurate than the top-down methods due to the scale variation. To address these problems, we propose a novel Dual-Pipeline Integrated Transformer (DPIT) by integrating top-down and bottom-up pipelines to explore the visual clues of different receptive fields and achieve their complementarity. Specifically, DPIT consists of two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Dense Connections
