AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with   Pretrained ViT

Fangbo Qin; Taogang Hou; Shan Lin; Kaiyuan Wang; Michael C. Yip; Shan; Yu

arXiv:2309.08134·cs.CV·September 18, 2023

AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT

Fangbo Qin, Taogang Hou, Shan Lin, Kaiyuan Wang, Michael C. Yip, Shan, Yu

PDF

Open Access

TL;DR

AnyOKP introduces a one-shot, instance-aware object keypoint extraction method leveraging pretrained ViT, capable of identifying keypoints across multiple object instances and categories with high robustness to domain shifts and viewpoint changes.

Contribution

The paper presents a novel one-shot keypoint extraction approach using pretrained ViT that is generalizable, transferable, and does not require training for new object instances.

Findings

01

Effective on real robot-captured images across various domains

02

Demonstrates high cross-category flexibility and instance awareness

03

Shows robustness to domain shift and viewpoint variation

Abstract

Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by training-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on appearance similarity, to yield instance-unaware candidate keypoints.Then, the entire graph with all candidate keypoints as vertices are divided to sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph represents an object instance. AnyOKP is evaluated on real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization

MethodsAttention Is All You Need · Softmax · Dense Connections · Linear Layer · Residual Connection · Multi-Head Attention · Layer Normalization · Vision Transformer