Mind the Gap Between Prototypes and Images in Cross-domain Finetuning
Hongduan Tian, Feng Liu, Zhanke Zhou, Tongliang Liu, Chengqi Zhang, Bo, Han

TL;DR
This paper identifies a modality gap between prototypes and images in cross-domain few-shot classification and proposes CoPA, a method that adapts different transformations for each, leading to improved performance and representation clustering.
Contribution
The paper introduces CoPA, a novel contrastive adaptation method that independently transforms prototypes and images, addressing the modality gap in cross-domain few-shot learning.
Findings
CoPA achieves state-of-the-art results on Meta-Dataset.
It learns better representation clusters and enlarges the prototype-image gap.
CoPA minimizes validation loss at an enlarged gap.
Abstract
In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image…
Peer Reviews
Decision·NeurIPS 2024 poster
1. This paper is well organized, easy to follow and free of typos. 2. Extensive empirical and theoretical analyses are provided, so the proposed contrastive learning method seems to be technically sound. 3. The implementation details are clearly stated, the algorithm and source code are provided, ensuring the reproducibility of the method. 4. Experiments and ablation studies are adequate, the results under different experimental settings are convincing and promising.
1. For the empirical analysis in Section 3.2, there is an interesting phenomenon that “appropriately enlarging the gap between the prototypes and image instances contributes to achieving better generalization performance”. It may be better to provide some discussions of the reasons for these these observed results (Figure 3(a)) to explain the influence of “enlarging the gap” on the “generalization ability”. 2. The competing methods used in experiments appear to be somewhat outdated. The main bas
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Numerical Analysis Techniques
MethodsFocus · Contrastive Language-Image Pre-training
