Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin, John Cooper, Frederic Sala

TL;DR
This paper introduces the concept of overlap density as a data-centric measure to understand and improve weak-to-strong generalization in machine learning, supported by theoretical analysis and empirical validation.
Contribution
It proposes a simple, practical overlap detection algorithm and demonstrates how maximizing overlap density enhances weak-to-strong generalization.
Findings
Overlap density correlates with generalization performance.
The proposed algorithm effectively identifies key data points for learning.
Empirical results validate the theoretical benefits of maximizing overlap density.
Abstract
The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage…
Peer Reviews
Decision·ICLR 2025 Poster
The paper touches upon an important problem and raises an original, interesting and intuitive hypothesis. The writing is conveys the contributions of the paper clearly, and the findings seem significant, even though I am not an expert on this topic so I am not entirely sure what are the appropriate baselines. Finally, I also like that the authors reasoned formally about the empirical phenomenon and method.
I think that the paper covers a lot of material at the expense of some details that would've made the argument more precise and convincing. 1) The result in Theorem 4.1 makes sense in terms of being an upper bound on the error we are concerned with, but I could not see immediately whether the bound is very loose or not. For instance, it looks like if $f_{weak}$ preforms poorly on on $S_i \cap D_{hard only}$, then an accurate $f_{w2s}$ means the term on the RHS should be very large (since the di
The theoretical results seem correct to me (though I did not go through the appendix), and the experiments are comprehensive, illustrating that the results described in the paper also hold in practice.
My main concern is the clarity of the writing. I found it hard to understand the paper when reading it, due to the use of some terms without proper definitions (see questions).
- The paper is well-written, as a non-expert of W2S generalisation, goals are clearly stated and the notion of overlap, while hand-wavy, is well explained. - The empirical study is very complete and shows both relevance and limitation of the current approach of overlapping with only two distinguishable patterns. - Theoretical result are various and justify why overlapping and the proposed algorithms should be considered.
See Questions
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods
