Provable Target Sample Complexity Improvements as Pre-Trained Models Scale
Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui

TL;DR
This paper introduces a theoretical framework called caulking that explains how larger pre-trained models reduce the sample complexity needed for downstream tasks, aligning with empirical scaling laws.
Contribution
The paper provides the first theoretical justification for the observed reduction in sample complexity as pre-trained models scale, using a novel framework inspired by PEFT methods.
Findings
Improved pre-trained models provably decrease downstream sample complexity.
Theoretical analysis aligns with empirical scaling laws.
Framework offers insights into parameter-efficient fine-tuning effects.
Abstract
Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empirical studies on scaling laws, which demonstrate that larger pre-trained models can significantly reduce the sample complexity of downstream learning. However, existing theoretical investigations of pre-trained models lack the capability to explain this phenomenon. In this paper, we provide a theoretical investigation by introducing a novel framework, caulking, inspired by parameter-efficient fine-tuning (PEFT) methods such as adapter-based fine-tuning, low-rank adaptation, and partial fine-tuning. Our analysis establishes that improved pre-trained models provably decrease the sample complexity of downstream tasks, thereby offering theoretical justification for the empirically observed scaling laws…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
