OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection
Changsheng Lu, Zheyuan Liu, Piotr Koniusz

TL;DR
OpenKD introduces a multimodal prompt approach for zero- and few-shot keypoint detection, leveraging diverse text prompts and large language models to improve generalization and handle unseen keypoints effectively.
Contribution
The paper presents a novel OpenKD model that supports multimodal prompts and enhances zero-shot keypoint detection by interpolating auxiliary keypoints and texts, enabling better handling of unseen prompts.
Findings
Achieves state-of-the-art results on Z-FSKD benchmarks.
Effectively handles diverse and unseen text prompts with LLM parsing.
Significantly improves spatial reasoning for novel keypoints.
Abstract
Exploiting the foundation models (e.g., CLIP) to build a versatile keypoint detector has gained increasing attention. Most existing models accept either the text prompt (e.g., ``the nose of a cat''), or the visual prompt (e.g., support image with keypoint annotations), to detect the corresponding keypoints in query image, thereby, exhibiting either zero-shot or few-shot detection ability. However, the research on taking multimodal prompt is still underexplored, and the prompt diversity in semantics and language is far from opened. For example, how to handle unseen text prompts for novel keypoint detection and the diverse text prompts like ``Can you detect the nose and ears of a cat?'' In this work, we open the prompt diversity from three aspects: modality, semantics (seen v.s. unseen), and language, to enable a more generalized zero- and few-shot keypoint detection (Z-FSKD). We propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Image and Object Detection Techniques
MethodsSparse Evolutionary Training
