Exploring the Limits of Large Scale Pre-training
Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie, Sedghi

TL;DR
This paper systematically investigates the limits of large-scale pre-training in vision models, revealing that improvements in upstream accuracy tend to saturate and do not always translate to better downstream performance, which is influenced by representation evolution.
Contribution
The study provides a comprehensive analysis of the saturation phenomenon in large-scale pre-training and introduces a model capturing the nonlinear relationship between upstream and downstream performance.
Findings
Downstream performance saturates with increased upstream accuracy.
Representation evolution explains performance saturation.
Better downstream results may require sacrificing upstream accuracy.
Abstract
Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks. In this work, we systematically study this phenomena and establish that, as we increase the upstream accuracy, the performance of downstream tasks saturates. In particular, we investigate more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with number of parameters ranging from ten million to ten billion, trained on the largest scale of available image data (JFT, ImageNet21K) and evaluated on more than 20 downstream image recognition tasks. We propose a model for downstream performance that reflects the saturation phenomena and captures the nonlinear relationship in performance of upstream and downstream tasks. Delving deeper to understand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
