Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo,, Yu Qiao, Kaipeng Zhang

TL;DR
This paper introduces a zero-shot animal keypoint detection framework that uses text prompts and semantic features, enabling detection across species without prior annotations, outperforming existing methods.
Contribution
The paper proposes the KDSM framework for open-vocabulary keypoint detection, combining vision and language models with novel modules for improved generalization and zero-shot performance.
Findings
KDSM outperforms baseline methods in keypoint detection tasks.
Zero-shot approach achieves results comparable to few-shot methods.
Framework demonstrates strong generalization across animal species.
Abstract
Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though less dependent on extensive manual input, still requires necessary support images with annotation for reference during testing. To realize zero-shot keypoint detection without any prior annotation, we introduce the Open-Vocabulary Keypoint Detection (OVKD) task, which is innovatively designed to use text prompts for identifying arbitrary keypoints across any species. In pursuit of this goal, we have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Multimodal Machine Learning Applications
