Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

Zigang Geng; Ke Sun; Bin Xiao; Zhaoxiang Zhang; Jingdong Wang

arXiv:2104.02300·cs.CV·April 7, 2021·34 cites

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces DEKR, a novel bottom-up human pose estimation method using disentangled keypoint regression with adaptive convolutions, achieving superior accuracy over traditional detection and grouping approaches.

Contribution

The paper proposes a simple, effective disentangled keypoint regression framework that improves spatial accuracy in bottom-up pose estimation by focusing on keypoint regions with adaptive convolutions.

Findings

01

Outperforms keypoint detection and grouping methods on COCO and CrowdPose datasets.

02

Achieves state-of-the-art results in bottom-up human pose estimation.

03

Demonstrates the effectiveness of disentangled representations for keypoint regression.

Abstract

In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions. We present a simple yet effective approach, named disentangled keypoint regression (DEKR). We adopt adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them. We use a multi-branch structure for separate regression: each branch learns a representation with dedicated adaptive convolutions and regresses one keypoint. The resulting disentangled representations are able to attend to the keypoint regions, respectively, and thus the keypoint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging

MethodsSpatial Transformer