UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot   Object Detection and Instance Segmentation

Chengyuan Zhang; Yilin Zhang; Lei Zhu; Deyin Liu; Lin Wu; Bo Li,; Shichao Zhang; Mohammed Bennamoun; Farid Boussaid

arXiv:2411.08569·cs.CV·November 14, 2024·2 cites

UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation

Chengyuan Zhang, Yilin Zhang, Lei Zhu, Deyin Liu, Lin Wu, Bo Li,, Shichao Zhang, Mohammed Bennamoun, Farid Boussaid

PDF

Open Access

TL;DR

UIFormer is a Transformer-based framework that effectively handles incremental few-shot object detection and segmentation, maintaining high performance on new and old classes without access to previous training data.

Contribution

It introduces a unified two-stage incremental learning framework with classifier selection and knowledge distillation, advancing the state-of-the-art in incremental few-shot detection and segmentation.

Findings

01

Outperforms existing methods on COCO and LVIS datasets

02

Effectively mitigates overfitting on novel classes

03

Prevents catastrophic forgetting of base classes

Abstract

This paper introduces a novel framework for unified incremental few-shot object detection (iFSOD) and instance segmentation (iFSIS) using the Transformer architecture. Our goal is to create an optimal solution for situations where only a few examples of novel object classes are available, with no access to training data for base or old classes, while maintaining high performance across both base and novel classes. To achieve this, We extend Mask-DINO into a two-stage incremental learning framework. Stage 1 focuses on optimizing the model using the base dataset, while Stage 2 involves fine-tuning the model on novel classes. Besides, we incorporate a classifier selection strategy that assigns appropriate classifiers to the encoder and decoder according to their distinct functions. Empirical evidence indicates that this approach effectively mitigates the over-fitting on novel classes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Image and Object Detection Techniques

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection