Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

Hao Zhang; Lumin Xu; Shenqi Lai; Wenqi Shao; Nanning Zheng; Ping Luo,; Yu Qiao; Kaipeng Zhang

arXiv:2310.05056·cs.CV·October 3, 2024

Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo,, Yu Qiao, Kaipeng Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a zero-shot animal keypoint detection framework that uses text prompts and semantic features, enabling detection across species without prior annotations, outperforming existing methods.

Contribution

The paper proposes the KDSM framework for open-vocabulary keypoint detection, combining vision and language models with novel modules for improved generalization and zero-shot performance.

Findings

01

KDSM outperforms baseline methods in keypoint detection tasks.

02

Zero-shot approach achieves results comparable to few-shot methods.

03

Framework demonstrates strong generalization across animal species.

Abstract

Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though less dependent on extensive manual input, still requires necessary support images with annotation for reference during testing. To realize zero-shot keypoint detection without any prior annotation, we introduce the Open-Vocabulary Keypoint Detection (OVKD) task, which is innovatively designed to use text prompts for identifying arbitrary keypoints across any species. In pursuit of this goal, we have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhanghao5201/kdsm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Multimodal Machine Learning Applications