Modeling Multi-modal Cross-interaction for Multi-label Few-shot Image   Classification Based on Local Feature Selection

Kun Yan; Zied Bouraoui; Fangyun Wei; Chang Xu; Ping Wang; Shoaib; Jameel; Steven Schockaert

arXiv:2412.13732·cs.CV·February 25, 2025

Modeling Multi-modal Cross-interaction for Multi-label Few-shot Image Classification Based on Local Feature Selection

Kun Yan, Zied Bouraoui, Fangyun Wei, Chang Xu, Ping Wang, Shoaib, Jameel, Steven Schockaert

PDF

Open Access

TL;DR

This paper introduces a novel multi-modal cross-interaction approach with local feature selection for multi-label few-shot image classification, leveraging word embeddings and a prototype refinement process to improve label prediction accuracy.

Contribution

It proposes a new strategy combining word embedding initialization, local feature selection via Loss Change Measurement, and multi-modal cross-interaction for enhanced multi-label few-shot classification.

Findings

01

Significant performance improvements over state-of-the-art methods.

02

Effective local feature selection enhances label prototype accuracy.

03

Robustness demonstrated across multiple datasets.

Abstract

The aim of multi-label few-shot image classification (ML-FSIC) is to assign semantic labels to images, in settings where only a small number of training examples are available for each label. A key feature of the multi-label setting is that an image often has several labels, which typically refer to objects appearing in different regions of the image. When estimating label prototypes, in a metric-based setting, it is thus important to determine which regions are relevant for which labels, but the limited amount of training data and the noisy nature of local features make this highly challenging. As a solution, we propose a strategy in which label prototypes are gradually refined. First, we initialize the prototypes using word embeddings, which allows us to leverage prior knowledge about the meaning of the labels. Second, taking advantage of these initial prototypes, we then use a Loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems