Detect Everything with Few Examples

Xinyu Zhang; Yuhan Liu; Yuting Wang; Abdeslam Boularias

arXiv:2309.12969·cs.CV·October 4, 2024·5 cites

Detect Everything with Few Examples

Xinyu Zhang, Yuhan Liu, Yuting Wang, Abdeslam Boularias

PDF

Open Access 1 Repo 3 Reviews

TL;DR

DE-ViT is a novel few-shot object detection method that eliminates the need for finetuning, using a region-propagation mechanism and prototype-based feature projection, achieving state-of-the-art results on multiple benchmarks.

Contribution

Introduces DE-ViT, a finetuning-free few-shot object detector with a new region-propagation architecture and robust prototype projection, advancing the state-of-the-art in few-shot detection.

Findings

01

Sets new state-of-the-art on Pascal VOC, COCO, and LVIS benchmarks.

02

Surpasses previous few-shot methods by large margins, e.g., 15 mAP on COCO 10-shot.

03

Successfully applied to real robot pick-and-place tasks.

Abstract

Few-shot object detection aims at detecting novel categories given only a few example images. It is a basic skill for a robot to perform tasks in open environments. Recent methods focus on finetuning strategies, with complicated procedures that prohibit a wider application. In this paper, we introduce DE-ViT, a few-shot object detector without the need for finetuning. DE-ViT's novel architecture is based on a new region-propagation mechanism for localization. The propagated region masks are transformed into bounding boxes through a learnable spatial integral layer. Instead of training prototype classifiers, we propose to use prototypes to project ViT features into a subspace that is robust to overfitting on base classes. We evaluate DE-ViT on few-shot, and one-shot object detection benchmarks with Pascal VOC, COCO, and LVIS. DE-ViT establishes new state-of-the-art results on all…

Peer Reviews

Decision·CoRL 2024

Reviewer 01Rating 3Confidence 4

Reviewer 02Rating 3Confidence 3

Reviewer 03Rating 4Confidence 3

Code & Models

Repositories

mlzxy/devit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques

MethodsFocus · Balanced Selection