Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
Min Jae Jung, Seung Dae Han, Joohee Kim

TL;DR
This paper introduces RISF, a novel few-shot object detection method that leverages CLIP-based re-scoring and a specialized loss to improve detection accuracy with limited data.
Contribution
It proposes a new framework combining CLIP-based re-scoring and a background negative re-scale loss for enhanced few-shot object detection.
Findings
RISF outperforms state-of-the-art methods on MS-COCO and PASCAL VOC datasets.
The CLIP-based re-scoring effectively improves classification accuracy.
The modified loss reduces false positives from confusing categories.
Abstract
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications
MethodsSoftmax · Convolution · Region Proposal Network · RoIPool · Faster R-CNN · Contrastive Language-Image Pre-training
