Anticipating Future Object Compositions without Forgetting

Youssef Zahran; Gertjan Burghouts; Yke Bauke Eisma

arXiv:2407.10723·cs.CV·September 4, 2024

Anticipating Future Object Compositions without Forgetting

Youssef Zahran, Gertjan Burghouts, Yke Bauke Eisma

PDF

Open Access

TL;DR

This paper advances compositional zero-shot learning in object detection by integrating grounding, soft prompting, anticipation, and contrastive tuning, significantly improving generalization to novel object-attribute combinations.

Contribution

It introduces a novel framework combining compositional soft prompting, anticipation, and contrastive tuning to enhance object detection in CZSL without forgetting prior knowledge.

Findings

01

70.5% improvement over CSP on harmonic mean in CLEVR

02

14.5% increase in harmonic mean across datasets

03

Effective learning of compositions with limited data

Abstract

Despite the significant advancements in computer vision models, their ability to generalize to novel object-attribute compositions remains limited. Existing methods for Compositional Zero-Shot Learning (CZSL) mainly focus on image classification. This paper aims to enhance CZSL in object detection without forgetting prior learned knowledge. We use Grounding DINO and incorporate Compositional Soft Prompting (CSP) into it and extend it with Compositional Anticipation. We achieve a 70.5% improvement over CSP on the harmonic mean (HM) between seen and unseen compositions on the CLEVR dataset. Furthermore, we introduce Contrastive Prompt Tuning to incrementally address model confusion between similar compositions. We demonstrate the effectiveness of this method and achieve an increase of 14.5% in HM across the pretrain, increment, and unseen sets. Collectively, these methods provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpace Science and Extraterrestrial Life

MethodsAttention Is All You Need · Softmax · Residual Connection · Layer Normalization · Focus · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer · self-DIstillation with NO labels