Cliff-Learning

Tony T. Wang; Igor Zablotchi; Nir Shavit; Jonathan S. Rosenfeld

arXiv:2302.07348·cs.LG·June 8, 2023

Cliff-Learning

Tony T. Wang, Igor Zablotchi, Nir Shavit, Jonathan S. Rosenfeld

PDF

Open Access

TL;DR

This paper investigates the phenomenon of cliff-learning in transfer learning from foundation models, revealing that performance improvements can accelerate unexpectedly in low-data regimes, influenced by prior-task compatibility.

Contribution

It introduces the concept of cliff-learning, analyzes its behavior through toy models, and links it to the compatibility between priors and tasks in low-data transfer learning.

Findings

01

Cliff-learning regions show faster-than-power-law performance gains.

02

Degree of cliff-learning correlates with prior-task compatibility.

03

Toy models help explain the underlying mechanisms.

Abstract

We study the data-scaling of transfer learning from foundation models in the low-downstream-data regime. We observe an intriguing phenomenon which we call cliff-learning. Cliff-learning refers to regions of data-scaling laws where performance improves at a faster than power law rate (i.e. regions of concavity on a log-log scaling plot). We conduct an in-depth investigation of foundation-model cliff-learning and study toy models of the phenomenon. We observe that the degree of cliff-learning reflects the degree of compatibility between the priors of a learning algorithm and the task being learned.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing · Machine Learning and ELM