Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions

Yihao Xue; Jiping Li; Baharan Mirzasoleiman

arXiv:2502.00620·cs.LG·June 19, 2025

Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions

Yihao Xue, Jiping Li, Baharan Mirzasoleiman

PDF

Open Access 1 Video

TL;DR

This paper offers a theoretical framework and empirical evidence showing how weak supervision can guide stronger models, with kernels derived from internal representations predicting performance trends across various tasks.

Contribution

It introduces a kernel-based theoretical characterization of weak-to-strong generalization, linking internal representations to performance prediction without labels.

Findings

01

Kernel-based metrics predict W2SG performance trends.

02

Strong models can surpass weak supervisors even with imperfect supervision.

03

Representation analysis explains error correction in weak supervision.

Abstract

Weak-to-Strong Generalization (W2SG), where a weak model supervises a stronger one, serves as an important analogy for understanding how humans might guide superhuman intelligence in the future. Promising empirical results revealed that a strong model can surpass its weak supervisor. While recent work has offered theoretical insights into this phenomenon, a clear understanding of the interactions between weak and strong models that drive W2SG remains elusive. We investigate W2SG through a theoretical lens and show that it can be characterized using kernels derived from the principal components of weak and strong models' internal representations. These kernels can be used to define a space that, at a high level, captures what the weak model is unable to learn but is learnable by the strong model. The projection of labels onto this space quantifies how much the strong model falls short of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions· slideslive

Taxonomy

TopicsNeural Networks and Applications