Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation
William McNally, Kanav Vats, Alexander Wong, John McPhee

TL;DR
KAPAO introduces a novel object-based approach for multi-person human pose estimation, replacing heatmap regression with a more efficient, accurate, and unified detection framework that models keypoints and poses as objects.
Contribution
The paper presents KAPAO, a new single-stage dense detection method that models keypoints and poses as objects, improving speed and accuracy over traditional heatmap-based methods.
Findings
KAPAO outperforms previous methods in accuracy and speed.
The object-based approach reduces computation and quantization errors.
KAPAO performs well without test-time augmentation.
Abstract
In keypoint estimation tasks such as human pose estimation, heatmap-based regression is the dominant approach despite possessing notable drawbacks: heatmaps intrinsically suffer from quantization error and require excessive computation to generate and post-process. Motivated to find a more efficient solution, we propose to model individual keypoints and sets of spatially related keypoints (i.e., poses) as objects within a dense single-stage anchor-based detection framework. Hence, we call our method KAPAO (pronounced "Ka-Pow"), for Keypoints And Poses As Objects. KAPAO is applied to the problem of single-stage multi-person human pose estimation by simultaneously detecting human pose and keypoint objects and fusing the detections to exploit the strengths of both object representations. In experiments, we observe that KAPAO is faster and more accurate than previous methods, which suffer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Human Motion and Animation
