SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up   Human Pose Estimation

Jiabin Zhang; Zheng Zhu; Jiwen Lu; Junjie Huang; Guan Huang; Jie Zhou

arXiv:2104.02486·cs.CV·April 8, 2021

SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation

Jiabin Zhang, Zheng Zhu, Jiwen Lu, Junjie Huang, Guan Huang, Jie Zhou

PDF

Open Access 1 Video

TL;DR

SIMPLE is a novel bottom-up human pose estimation framework that mimics top-down knowledge and unifies detection and pose estimation in a single network, achieving state-of-the-art results with high efficiency.

Contribution

The paper introduces the first mimicking strategy between top-down and bottom-up methods and a unified point learning framework for pose estimation.

Findings

01

Achieves state-of-the-art performance among bottom-up methods on COCO, MPII, and PoseTrack datasets.

02

Maintains high inference speed comparable to top-down methods.

03

Outperforms previous bottom-up approaches in accuracy.

Abstract

The practical application requests both accuracy and efficiency on multi-person pose estimation algorithms. But the high accuracy and fast inference speed are dominated by top-down methods and bottom-up methods respectively. To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE). Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline, which significantly promotes SIMPLE's accuracy while maintaining its high efficiency during inference. Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network. This is quite different from previous works where the two tasks may interfere with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SIMPLE: Single-Network with Mimicking and Point Learning for Bottom-Up Human Pose Estimation· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems