Efficient Human Pose Estimation by Learning Deeply Aggregated Representations
Zhengxiong Luo, Zhicheng Wang, Yuanhao Cai, Guanan Wang, Yan Huang,, Liang Wang, Erjin Zhou, Tieniu Tan, Jian Sun

TL;DR
This paper introduces DANet, an efficient human pose estimation network that learns deeply aggregated multi-scale representations using novel fusion and attention mechanisms, achieving high accuracy with low complexity.
Contribution
The paper proposes the orthogonal attention block and second-order fusion unit to effectively fuse multi-scale features in a single pyramid network, reducing complexity while maintaining accuracy.
Findings
DANet-72 achieves 70.5 AP on COCO test-dev.
DANet runs at 58 persons-per-second on CPU.
The proposed methods improve multi-scale feature fusion efficiency.
Abstract
In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale information mainly from features with different spatial sizes. Powerful multi-scale representations usually rely on the cascaded pyramid framework. This framework largely boosts the performance but in the meanwhile makes networks very deep and complex. Instead, we focus on exploiting multi-scale information from layers with different receptive-field sizes and then making full of use this information by improving the fusion method. Specifically, we propose an orthogonal attention block (OAB) and a second-order fusion unit (SFU). The OAB learns multi-scale information from different layers and enhances them by encouraging them to be diverse. The SFU adaptively selects and fuses diverse multi-scale information and suppress the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
