Towards High Performance One-Stage Human Pose Estimation
Ling Li, Lin Zhao, Linhao Xu, Jie Xu

TL;DR
This paper enhances one-stage human pose estimation by improving feature extraction and introducing a global context module, achieving high accuracy with efficiency comparable to single-task models.
Contribution
It proposes specific improvements to Mask R-CNN for pose estimation, including a global context module, to boost performance while maintaining efficiency.
Findings
Achieves 68.1 AP on COCO val2017 with ResNet-50 backbone.
Narrowed performance gap with two-stage methods while being faster.
Introduced a global context module that enlarges receptive field for better keypoint detection.
Abstract
Making top-down human pose estimation method present both good performance and high efficiency is appealing. Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework, as the features provided by the backbone are able to be shared by the two tasks. However, the performance is not as good as traditional two-stage methods. In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency. Specifically, we make improvements on the whole process of pose estimation, which contains feature extraction and keypoint detection. The part of feature extraction is ensured to get enough and valuable information of pose. Then, we introduce a Global Context Module into the keypoints detection branch to enlarge the receptive field, as it is crucial to successful human pose estimation. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
