CASA: Class-Agnostic Shared Attributes in Vision-Language Models for   Efficient Incremental Object Detection

Mingyi Guo; Yuyang Liu; Zhiyuan Yan; Zongying Lin; Peixi Peng and; Yonghong Tian

arXiv:2410.05804·cs.CV·April 1, 2025

CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection

Mingyi Guo, Yuyang Liu, Zhiyuan Yan, Zongying Lin, Peixi Peng and, Yonghong Tian

PDF

Open Access

TL;DR

This paper introduces CASA, a novel approach for incremental object detection that mitigates catastrophic forgetting by learning shared, category-agnostic attributes, leveraging language models and attribute selection to improve performance on sequential tasks.

Contribution

CASA is the first method to incorporate shared, category-agnostic attributes generated by language models for incremental object detection, enhancing knowledge retention and adaptability.

Findings

01

Achieves state-of-the-art results on COCO dataset.

02

Effectively mitigates catastrophic forgetting in incremental detection.

03

Utilizes language models for attribute generation and selection.

Abstract

Incremental object detection is fundamentally challenged by catastrophic forgetting. A major factor contributing to this issue is background shift, where background categories in sequential tasks may overlap with either previously learned or future unseen classes. To address this, we propose a novel method called Class-Agnostic Shared Attribute Base (CASA) that encourages the model to learn category-agnostic attributes shared across incremental classes. Our approach leverages an LLM to generate candidate textual attributes, selects the most relevant ones based on the current training data, and records their importance in an assignment matrix. For subsequent tasks, the retained attributes are frozen, and new attributes are selected from the remaining candidates, ensuring both knowledge retention and adaptability. Extensive experiments on the COCO dataset demonstrate the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training · Balanced Selection