X-Pose: Detecting Any Keypoints

Jie Yang; Ailing Zeng; Ruimao Zhang; Lei Zhang

arXiv:2310.08530·cs.CV·July 18, 2024·1 cites

X-Pose: Detecting Any Keypoints

Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang

PDF

Open Access 2 Repos

TL;DR

X-Pose introduces an end-to-end multi-modal prompt-based framework for detecting any keypoints across diverse objects and scenarios, supported by a large unified dataset, UniKPT, achieving significant accuracy improvements.

Contribution

The paper presents X-Pose, a novel multi-modal prompt-based keypoint detection framework and the UniKPT dataset, enabling accurate detection of diverse keypoints in complex real-world images.

Findings

01

X-Pose outperforms existing methods with 27.7 AP, 6.44 PCK, and 7.0 AP improvements.

02

The UniKPT dataset unifies 13 datasets with 338 keypoints across 1,237 categories.

03

X-Pose demonstrates strong generalization across styles, categories, and poses.

Abstract

This work aims to address an advanced keypoint detection problem: how to accurately detect any keypoints in complex real-world scenarios, which involves massive, messy, and open-ended objects as well as their associated keypoints definitions. Current high-performance keypoint detectors often fail to tackle this problem due to their two-stage schemes, under-explored prompt designs, and limited training data. To bridge the gap, we propose X-Pose, a novel end-to-end framework with multi-modal (i.e., visual, textual, or their combinations) prompts to detect multi-object keypoints for any articulated (e.g., human and animal), rigid, and soft objects within a given image. Moreover, we introduce a large-scale dataset called UniKPT, which unifies 13 keypoint detection datasets with 338 keypoints across 1,237 categories over 400K instances. Training with UniKPT, X-Pose effectively aligns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsALIGN · Contrastive Learning · Focus