CrossTransformers: spatially-aware few-shot transfer

Carl Doersch; Ankush Gupta; Andrew Zisserman

arXiv:2007.11498·cs.CV·February 18, 2021·58 cites

CrossTransformers: spatially-aware few-shot transfer

Carl Doersch, Ankush Gupta, Andrew Zisserman

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces CrossTransformers, a novel Transformer-based architecture combined with self-supervised learning to improve few-shot transfer in vision systems, addressing supervision collapse and enhancing robustness across tasks and domains.

Contribution

The paper proposes CrossTransformers and a self-supervised training approach to mitigate supervision collapse, enabling more effective few-shot transfer in vision models.

Findings

01

Achieved state-of-the-art results on Meta-Dataset for transfer learning.

02

Demonstrated robustness to task and domain shifts.

03

Showed improved generalization with spatially-aware features.

Abstract

Given new tasks with very little data $-$ such as new classes in a classification problem or a domain shift in the input $-$ performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

CrossTransformers: spatially-aware few-shot transfer· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Remote-Sensing Image Classification

MethodsLinear Layer · CrossTransformers · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Softmax · Label Smoothing · Byte Pair Encoding · Attention Is All You Need · Dense Connections