Efficient Training of Visual Transformers with Small Datasets

Yahui Liu; Enver Sangineto; Wei Bi; Nicu Sebe; Bruno Lepri; Marco; De Nadai

arXiv:2106.03746·cs.CV·November 16, 2021·84 cites

Efficient Training of Visual Transformers with Small Datasets

Yahui Liu, Enver Sangineto, Wei Bi, Nicu Sebe, Bruno Lepri, Marco, De Nadai

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a self-supervised training task for Visual Transformers that enhances their performance on small datasets by encouraging learning of spatial relations, making training more robust and improving accuracy.

Contribution

The paper proposes a novel self-supervised task that can be integrated with existing Visual Transformers to improve their data efficiency and robustness in small dataset regimes.

Findings

01

Self-supervised task improves VT accuracy on small datasets

02

Method is architecture-agnostic and easy to implement

03

Significant accuracy gains demonstrated across multiple datasets

Abstract

Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger representation capacity. However, the lack of the typical convolutional inductive bias makes these models more data-hungry than common CNNs. In fact, some local properties of the visual domain which are embedded in the CNN architectural design, in VTs should be learned from samples. In this paper, we empirically analyse different VTs, comparing their robustness in a small training-set regime, and we show that, despite having a comparable accuracy when trained on ImageNet, their performance on smaller datasets can be largely different. Moreover, we propose a self-supervised task which can extract additional information from images with only a negligible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yhlleo/VTs-Drloc
pytorchOfficial

Videos

Efficient Training of Visual Transformers with Small Datasets· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques