Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning
Weihao Jiang, Chang Liu, Kun He

TL;DR
This paper introduces an intra-task mutual attention mechanism in Vision Transformers for few-shot learning, enhancing feature focus and representation sharing between support and query images, leading to improved classification performance.
Contribution
It proposes a novel intra-task mutual attention method with patch token swapping in pre-trained ViT models, combined with self-supervised pre-training and meta-learning fine-tuning, for more effective few-shot classification.
Findings
Achieves superior performance on five few-shot benchmarks.
Reduces the number of parameters needing fine-tuning.
Demonstrates effectiveness and efficiency of the proposed method.
Abstract
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples. Such ability stems from their capacity to identify common features shared between new and previously seen images while disregarding distractions such as background variations. However, for artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge. In this paper, we propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches and encoding them using the pre-trained Vision Transformer (ViT) architecture. Specifically, we swap the class (CLS) token and patch tokens between the support and query sets to have the mutual attention, which enables each set to focus on the most useful information. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Brain Tumor Detection and Classification
MethodsAttention Is All You Need · Sparse Evolutionary Training · Dense Connections · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings
