On the Blessing of Pre-training in Weak-to-Strong Generalization

Wei Yao; Wang Zhaoyang; Gengze Xu; Chen Qian; Dongrui Liu; Ziqiao Wang; Yong Liu; Yunbei Xu

arXiv:2605.05710·cs.LG·May 8, 2026

On the Blessing of Pre-training in Weak-to-Strong Generalization

Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu, Ziqiao Wang, Yong Liu, Yunbei Xu

PDF

TL;DR

This paper investigates the critical role of pre-training in enabling Weak-to-Strong Generalization (W2SG), providing theoretical proofs and empirical evidence that pre-training acts as a geometric warm start necessary for W2SG to occur.

Contribution

It formalizes the W2SG problem within a high-dimensional model, proving that pre-training enables W2SG through a phase transition observed in large language models.

Findings

01

Pre-training provides a geometric warm start crucial for W2SG.

02

W2SG emerges via a phase transition during pre-training.

03

A generalization bound captures the optimization dynamics of W2SG.

Abstract

The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work, we identify pre-training as the essential prerequisite for the emergence of W2SG. Theoretically, we formalize the W2SG problem within a high-dimensional single-index model framework using spiked Gaussian data, modeling pre-training as a spectral initialization step. Building upon prior impossibility results regarding the failure of learning under random initialization, we prove that W2SG is achievable when pre-training provides a geometric warm start that places the model within an "effective region" characterized by a perturbed strong-convexity geometry. Within this region, we derive a rigorous generalization bound that naturally captures the optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.