Cliff-Learning
Tony T. Wang, Igor Zablotchi, Nir Shavit, Jonathan S. Rosenfeld

TL;DR
This paper investigates the phenomenon of cliff-learning in transfer learning from foundation models, revealing that performance improvements can accelerate unexpectedly in low-data regimes, influenced by prior-task compatibility.
Contribution
It introduces the concept of cliff-learning, analyzes its behavior through toy models, and links it to the compatibility between priors and tasks in low-data transfer learning.
Findings
Cliff-learning regions show faster-than-power-law performance gains.
Degree of cliff-learning correlates with prior-task compatibility.
Toy models help explain the underlying mechanisms.
Abstract
We study the data-scaling of transfer learning from foundation models in the low-downstream-data regime. We observe an intriguing phenomenon which we call cliff-learning. Cliff-learning refers to regions of data-scaling laws where performance improves at a faster than power law rate (i.e. regions of concavity on a log-log scaling plot). We conduct an in-depth investigation of foundation-model cliff-learning and study toy models of the phenomenon. We observe that the degree of cliff-learning reflects the degree of compatibility between the priors of a learning algorithm and the task being learned.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing · Machine Learning and ELM
