Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot   Learning

Sua Jung

arXiv:2501.17171·cs.CV·January 30, 2025

Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning

Sua Jung

PDF

Open Access

TL;DR

This paper introduces a novel prompt learning approach with inter/intra-modality fusion to enhance compositional zero-shot learning, effectively recognizing subtle semantic differences and object combinations.

Contribution

It proposes a new method combining inter/intra-modality fusion prompts to improve attribute recognition in CZSL, addressing limitations of previous prompt-based techniques.

Findings

01

Improved attribute recognition accuracy in CZSL tasks.

02

Effective handling of subtle semantic differences.

03

Enhanced recognition of multiple object compositions.

Abstract

Compositional Zero-Shot Learning (CZSL) aims to recognize subtle differences in meaning or the combination of states and objects through the use of known and unknown concepts during training. Existing methods either focused on prompt configuration or on using prompts to tune the pre-trained Vision-Language model. However, these methods faced challenges in accurately identifying subtle differences in meaning or combining states with objects. To jointly eradicate the above issues and construct an efficient and effective CZSL technique, we suggest a method to improve attribute recognition performance by utilizing diverse Prompt Learning with an Inter/Intra-Modality Fusion Synthesizer in scene understanding involving subtle semantic differences and multiple objects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeophysical Methods and Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Synthesizer