A Large-scale Study of Representation Learning with the Visual Task   Adaptation Benchmark

Xiaohua Zhai; Joan Puigcerver; Alexander Kolesnikov; Pierre Ruyssen,; Carlos Riquelme; Mario Lucic; Josip Djolonga; Andre Susano Pinto; Maxim; Neumann; Alexey Dosovitskiy; Lucas Beyer; Olivier Bachem; Michael Tschannen,; Marcin Michalski; Olivier Bousquet; Sylvain Gelly; Neil Houlsby

arXiv:1910.04867·cs.CV·February 24, 2020·160 cites

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen,, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim, Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen,, Marcin Michalski, Olivier Bousquet, Sylvain Gelly

PDF

Open Access 2 Repos 10 Models 5 Datasets 1 Video

TL;DR

This paper introduces the Visual Task Adaptation Benchmark (VTAB) to evaluate the generalization of visual representations across diverse tasks, providing insights into the effectiveness of various learning algorithms and supervision methods.

Contribution

The paper presents VTAB, a comprehensive benchmark for assessing visual representations on diverse tasks, and conducts a large-scale study comparing different learning algorithms and supervision techniques.

Findings

01

ImageNet representations perform well beyond natural datasets

02

Generative and discriminative models show comparable effectiveness

03

Self-supervision can often replace labels effectively

Abstract

Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, reconstruction error). We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples. With VTAB, we conduct a large-scale study of many popular publicly-available representation learning algorithms. We carefully control confounders such as architecture and tuning budget. We address questions like: How effective are ImageNet representations beyond standard natural datasets? How do representations trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

The Visual Task Adaptation Benchmark· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling