Mind the Gap Between Prototypes and Images in Cross-domain Finetuning

Hongduan Tian; Feng Liu; Zhanke Zhou; Tongliang Liu; Chengqi Zhang; Bo; Han

arXiv:2410.12474·cs.CV·October 22, 2024

Mind the Gap Between Prototypes and Images in Cross-domain Finetuning

Hongduan Tian, Feng Liu, Zhanke Zhou, Tongliang Liu, Chengqi Zhang, Bo, Han

PDF

Open Access 1 Repo 1 Video 1 Reviews

TL;DR

This paper identifies a modality gap between prototypes and images in cross-domain few-shot classification and proposes CoPA, a method that adapts different transformations for each, leading to improved performance and representation clustering.

Contribution

The paper introduces CoPA, a novel contrastive adaptation method that independently transforms prototypes and images, addressing the modality gap in cross-domain few-shot learning.

Findings

01

CoPA achieves state-of-the-art results on Meta-Dataset.

02

It learns better representation clusters and enlarges the prototype-image gap.

03

CoPA minimizes validation loss at an enlarged gap.

Abstract

In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image…

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 6Confidence 3

Strengths

1. This paper is well organized, easy to follow and free of typos. 2. Extensive empirical and theoretical analyses are provided, so the proposed contrastive learning method seems to be technically sound. 3. The implementation details are clearly stated, the algorithm and source code are provided, ensuring the reproducibility of the method. 4. Experiments and ablation studies are adequate, the results under different experimental settings are convincing and promising.

Weaknesses

1. For the empirical analysis in Section 3.2, there is an interesting phenomenon that “appropriately enlarging the gap between the prototypes and image instances contributes to achieving better generalization performance”. It may be better to provide some discussions of the reasons for these these observed results (Figure 3(a)) to explain the influence of “enlarging the gap” on the “generalization ability”. 2. The competing methods used in experiments appear to be somewhat outdated. The main bas

Code & Models

Repositories

tmlr-group/CoPA
tfOfficial

Videos

Mind the Gap Between Prototypes and Images in Cross-domain Finetuning· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Numerical Analysis Techniques

MethodsFocus · Contrastive Language-Image Pre-training