Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)
Scott Geng, Dutch Hansen, Jerry Li

TL;DR
This paper shows that weak-to-strong generalization, where a student model surpasses its teacher after finetuning, is nearly inevitable even in simple linear models under mild assumptions.
Contribution
It demonstrates that weak-to-strong generalization occurs in standard linear logistic regression, challenging the belief that model capacity mismatch is necessary.
Findings
Weak-to-strong generalization occurs in linear logistic regression.
Most student-teacher pairs exhibit this phenomenon.
It is nearly inevitable even without model capacity mismatch.
Abstract
Weak-to-strong generalization is a phenomenon in post-training whereby a strong student model, when finetuned solely with feedback from a weaker teacher, can not only surpass the teacher, but can improve upon its own capabilities. Recent work of Burns et al. (2023) demonstrated that this can occur in the setting of frontier language models, and subsequently there has been a flurry of both empirical work trying to exploit this phenomenon, as well as theoretical work attempting to understand it. In this work, we demonstrate that weak-to-strong generalization occurs in standard linear logistic regression, under mild distributional assumptions on the data. In fact, we show that this happens for most student-teacher pairs, suggesting that weak-to-strong generalization is in fact \emph{almost inevitable}, even in this basic setting. Notably, our setting does not require the student to be more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
