YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss
Debapriya Maji, Soyeb Nagori, Manu Mathew, Deepak Poddar

TL;DR
YOLO-Pose introduces an end-to-end trainable framework that combines object detection and pose estimation in a single pass, achieving state-of-the-art accuracy without test-time augmentation.
Contribution
It proposes a novel heatmap-free, end-to-end trainable YOLO-based model for multi-person pose estimation that optimizes the Object Keypoint Similarity metric directly.
Findings
Achieves 90.2% AP50 on COCO validation set
Surpasses all bottom-up approaches in a single inference
No test-time augmentation required for top performance
Abstract
We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Advanced Neural Network Applications
MethodsFLIP · You Only Look Once · Heatmap
