An Empirical Study of Scaling Laws for Transfer

Matthew Barnett

arXiv:2408.16947·cs.LG·September 2, 2024

An Empirical Study of Scaling Laws for Transfer

Matthew Barnett

PDF

Open Access

TL;DR

This paper empirically investigates how transfer learning effectiveness in transformer models depends on the transfer gap, revealing how data scarcity and distribution differences influence transfer performance and cost-effectiveness.

Contribution

It introduces a scaling law incorporating the transfer gap, providing insights into transfer learning efficiency and data allocation strategies across diverse datasets.

Findings

01

Transfer gap varies significantly across datasets.

02

Low transfer gap favors pre-training; high gap favors data collection.

03

Scaling law can guide optimal data and model training strategies.

Abstract

We present a limited empirical study of scaling laws for transfer learning in transformer models. More specifically, we examine a scaling law that incorporates a "transfer gap" term, indicating the effectiveness of pre-training on one distribution when optimizing for downstream performance on another distribution. When the transfer gap is low, pre-training is a cost-effective strategy for improving downstream performance. Conversely, when the gap is high, collecting high-quality fine-tuning data becomes relatively more cost effective. Fitting the scaling law to experiments from diverse datasets reveals significant variations in the transfer gap across distributions. In theory, the scaling law can inform optimal data allocation strategies and highlights how the scarcity of downstream data can bottleneck performance. Our findings contribute to a principled way to measure transfer learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Imbalanced Data Classification Techniques