Meta-ZSDETR: Zero-shot DETR with Meta-learning
Lu Zhang, Chenbo Zhang, Jiajia Zhao, Jihong Guan, Shuigeng Zhou

TL;DR
Meta-ZSDETR introduces a novel zero-shot object detection method combining DETR and meta-learning, directly predicting class-specific boxes and utilizing meta-contrastive learning to improve detection of unseen classes.
Contribution
It is the first to integrate DETR with meta-learning for zero-shot detection, addressing recall and confusion issues in unseen classes.
Findings
Outperforms existing ZSD methods significantly on MS COCO and PASCAL VOC.
Effectively predicts class-specific boxes without relying on proposal generation.
Utilizes meta-contrastive learning to enhance class separation in visual space.
Abstract
Zero-shot object detection aims to localize and recognize objects of unseen classes. Most of existing works face two problems: the low recall of RPN in unseen classes and the confusion of unseen classes with background. In this paper, we present the first method that combines DETR and meta-learning to perform zero-shot object detection, named Meta-ZSDETR, where model training is formalized as an individual episode based meta-learning task. Different from Faster R-CNN based methods that firstly generate class-agnostic proposals, and then classify them with visual-semantic alignment module, Meta-ZSDETR directly predict class-specific boxes with class-specific queries and further filter them with the predicted accuracy from classification head. The model is optimized with meta-contrastive learning, which contains a regression head to generate the coordinates of class-specific boxes, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Meta-ZSDETR: Zero-shot DETR with Meta-learning· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Residual Connection
