Neural Interactive Keypoint Detection
Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang

TL;DR
Click-Pose is an interactive neural framework that significantly reduces labeling costs for 2D keypoint annotation by combining user feedback with a pose correction model, achieving high accuracy with minimal user clicks.
Contribution
The paper introduces Click-Pose, a novel end-to-end neural interactive keypoint detection method that effectively integrates user feedback to enhance annotation efficiency and accuracy.
Findings
Reduces annotation effort by over 10 times compared to manual annotation.
Achieves 1.97 and 6.45 NoC@95 on COCO and Human-Art, respectively, surpassing state-of-the-art models.
Improves AP scores on COCO and Human-Art without user clicks.
Abstract
This work proposes an end-to-end neural interactive keypoint detection framework named Click-Pose, which can significantly reduce more than 10 times labeling costs of 2D keypoint annotation compared with manual-only annotation. Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process. Specifically, we design the pose error modeling strategy that inputs the ground truth pose combined with four typical pose errors into the decoder and trains the model to reconstruct the correct poses, which enhances the self-correction ability of the model. Then, we attach an interactive human-feedback loop that allows receiving users' clicks to correct one or several predicted keypoints and iteratively utilizes the decoder to update all other keypoints with a minimum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Human Motion and Animation
