TL;DR
This paper analyzes weak-to-strong finetuning, revealing how low-dimensional intrinsic spaces and variance reduction explain its surprising effectiveness across vision and NLP tasks.
Contribution
It provides an exact variance characterization of weak-to-strong finetuning in low-dimensional spaces, explaining its success and sample complexity.
Findings
Weak-to-strong finetuning often outperforms the weak teacher.
Variance in the shared subspace is inherited, while discrepancy reduces variance proportionally to the subspace dimension.
Experiments confirm theoretical insights on synthetic, vision, and NLP tasks.
Abstract
Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces , we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
