SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation
Jiabin Zhang, Zheng Zhu, Jiwen Lu, Junjie Huang, Guan Huang, Jie Zhou

TL;DR
SIMPLE is a novel bottom-up human pose estimation framework that mimics top-down knowledge and unifies detection and pose estimation in a single network, achieving state-of-the-art results with high efficiency.
Contribution
The paper introduces the first mimicking strategy between top-down and bottom-up methods and a unified point learning framework for pose estimation.
Findings
Achieves state-of-the-art performance among bottom-up methods on COCO, MPII, and PoseTrack datasets.
Maintains high inference speed comparable to top-down methods.
Outperforms previous bottom-up approaches in accuracy.
Abstract
The practical application requests both accuracy and efficiency on multi-person pose estimation algorithms. But the high accuracy and fast inference speed are dominated by top-down methods and bottom-up methods respectively. To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE). Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline, which significantly promotes SIMPLE's accuracy while maintaining its high efficiency during inference. Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network. This is quite different from previous works where the two tasks may interfere with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
