Revisiting Continuity of Image Tokens for Cross-domain Few-shot Learning
Shuai Yi, Yixiong Zou, Yuhua Li, Ruixuan Li

TL;DR
This paper investigates how disrupting image token continuity in Vision Transformers affects cross-domain few-shot learning, revealing that less continuity encourages reliance on smaller patterns, thus improving generalization across distant domains.
Contribution
It uncovers the role of token continuity in ViT's domain generalization and proposes a simple method to disrupt continuity, enhancing performance in cross-domain few-shot tasks.
Findings
Disrupting token continuity reduces domain gaps.
Smaller patterns are more transferable across domains.
The proposed method outperforms state-of-the-art approaches.
Abstract
Vision Transformer (ViT) has achieved remarkable success due to its large-scale pretraining on general domains, but it still faces challenges when applying it to downstream distant domains that have only scarce training data, which gives rise to the Cross-Domain Few-Shot Learning (CDFSL) task. Inspired by Self-Attention's insensitivity to token orders, we find an interesting phenomenon neglected in current works: disrupting the continuity of image tokens (i.e., making pixels not smoothly transited across patches) in ViT leads to a noticeable performance decline in the general (source) domain but only a marginal decrease in downstream target domains. This questions the role of image tokens' continuity in ViT's generalization under large domain gaps. In this paper, we delve into this phenomenon for an interpretation. We find continuity aids ViT in learning larger spatial patterns, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image Processing Techniques · Seismic Imaging and Inversion Techniques
