Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
Diyuan Wu, Lehan Chen, Theodor Misiakiewicz, Marco Mondelli

TL;DR
This paper demonstrates that in random feature ridge regression, a strong student model trained on labels from a weak teacher can significantly improve the test error scaling law, even achieving minimax optimal rates.
Contribution
It provides a deterministic equivalent for the excess test error in weak-to-strong generalization, revealing regimes where the student's scaling law surpasses the teacher's.
Findings
Student can outperform teacher in bias and variance regimes.
Student attains minimax optimal rate regardless of teacher’s scaling law.
Improvement occurs even when teacher’s error does not decay with sample size.
Abstract
It is increasingly common in machine learning to use learned models to label data and then employ such data to train more capable models. The phenomenon of weak-to-strong generalization exemplifies the advantage of this two-stage procedure: a strong student is trained on imperfect labels obtained from a weak teacher, and yet the strong student outperforms the weak teacher. In this paper, we show that the potential improvement is substantial, in the sense that it affects the scaling law followed by the test error. Specifically, we consider students and teachers trained via random feature ridge regression (RFRR). Our main technical contribution is to derive a deterministic equivalent for the excess test error of the student trained on labels obtained via the teacher. Via this deterministic equivalent, we then identify regimes in which the scaling law of the student improves upon that of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Face and Expression Recognition
