Towards High Performance One-Stage Human Pose Estimation

Ling Li; Lin Zhao; Linhao Xu; Jie Xu

arXiv:2301.04842·cs.CV·January 13, 2023

Towards High Performance One-Stage Human Pose Estimation

Ling Li, Lin Zhao, Linhao Xu, Jie Xu

PDF

TL;DR

This paper enhances one-stage human pose estimation by improving feature extraction and introducing a global context module, achieving high accuracy with efficiency comparable to single-task models.

Contribution

It proposes specific improvements to Mask R-CNN for pose estimation, including a global context module, to boost performance while maintaining efficiency.

Findings

01

Achieves 68.1 AP on COCO val2017 with ResNet-50 backbone.

02

Narrowed performance gap with two-stage methods while being faster.

03

Introduced a global context module that enlarges receptive field for better keypoint detection.

Abstract

Making top-down human pose estimation method present both good performance and high efficiency is appealing. Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework, as the features provided by the backbone are able to be shared by the two tasks. However, the performance is not as good as traditional two-stage methods. In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency. Specifically, we make improvements on the whole process of pose estimation, which contains feature extraction and keypoint detection. The part of feature extraction is ensured to get enough and valuable information of pose. Then, we introduce a Global Context Module into the keypoints detection branch to enlarge the receptive field, as it is crucial to successful human pose estimation. On the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings