Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning
Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li

TL;DR
This paper investigates the role of middle layers in CLIP's text encoder for source-free cross-domain few-shot learning, revealing that re-utilizing these layers enhances performance under domain shifts.
Contribution
It introduces a novel method to re-utilize information in the lost middle layers of the text encoder, improving SF-CDFSL performance beyond simple layer removal.
Findings
Re-utilizing lost layers improves accuracy across datasets.
The method is effective with various backbones and tasks.
Extensive experiments validate the approach.
Abstract
Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) focuses on fine-tuning with limited training data from target domains (e.g., medical or satellite images), where CLIP has recently shown promising results due to its generalizability to downstream tasks. Current works indicate CLIP's text encoder is more suitable for cross-domain tasks, however, we find that \textbf{removing certain middle layers of the text encoder can effectively improve performance in SF-CDFSL}, which we call the Lost Layers. In this paper, we delve into this phenomenon for a deeper understanding. We discover that instead of being harmful for the SF-CDFSL task, the information in these layers is actually beneficial, but visual gaps prevent this useful information from being fully utilized, making these layers seem redundant. Based on this understanding, unlike current works that simply remove these layers, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Advanced Neural Network Applications
