Simple Control Baselines for Evaluating Transfer Learning
Andrei Atanov, Shijian Xu, Onur Beker, Andrei Filatov, Amir Zamir

TL;DR
This paper proposes a standardized evaluation framework with simple control baselines for transfer learning, enhancing the interpretability and comparability of transfer performance across different models and tasks.
Contribution
It introduces a set of critical control baselines—blind-guess, scratch-model, and maximal-supervision—for more meaningful transfer learning evaluation.
Findings
Existing self-supervised methods are more effective for image classification than dense predictions.
The proposed evaluation standard clarifies the contribution of architecture and dataset bias.
Using control baselines leads to more nuanced understanding of transfer learning performance.
Abstract
Transfer learning has witnessed remarkable progress in recent years, for example, with the introduction of augmentation-based contrastive self-supervised learning methods. While a number of large-scale empirical studies on the transfer performance of such models have been conducted, there is not yet an agreed-upon set of control baselines, evaluation practices, and metrics to report, which often hinders a nuanced and calibrated understanding of the real efficacy of the methods. We share an evaluation standard that aims to quantify and communicate transfer learning performance in an informative and accessible setup. This is done by baking a number of simple yet critical control baselines in the evaluation method, particularly the blind-guess (quantifying the dataset bias), scratch-model (quantifying the architectural contribution), and maximal-supervision (quantifying the upper-bound).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
