On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective
Behrad Moniri, Hamed Hassani

TL;DR
This paper provides a theoretical analysis of weak-to-strong generalization, revealing three core mechanisms that enable student models to outperform their teachers by leveraging regularization, parameterization, and feature learning.
Contribution
It uncovers three fundamental mechanisms through theoretical models that explain how student models can surpass teachers in weak-to-strong generalization.
Findings
Student can compensate for teacher's under-regularization.
Aligned regularization structures enable student to outperform teacher.
Students can learn complex features beyond teacher’s capabilities.
Abstract
Weak-to-strong generalization, where a student model trained on imperfect labels generated by a weaker teacher nonetheless surpasses that teacher, has been widely observed but the mechanisms that enable it have remained poorly understood. In this paper, through a theoretical analysis of simple models, we uncover three core mechanisms that can drive this phenomenon. First, by analyzing ridge regression, we study the interplay between the teacher and student regularization and prove that a student can compensate for a teacher's under-regularization and achieve lower test error. We also analyze the role of the parameterization regime of the models. Second, by analyzing weighted ridge regression, we show that a student model with a regularization structure more aligned to the target, can outperform its teacher. Third, in a nonlinear multi-index setting, we demonstrate that a student can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
