Relating Misfit to Gain in Weak-to-Strong Generalization Beyond the Squared Loss
Abhijeet Mulgund, Chirag Pabbaraju

TL;DR
This paper extends the understanding of performance gains in weak-to-strong generalization from squared loss to arbitrary Bregman divergences and convex combinations of models, with theoretical insights supported by experiments.
Contribution
It generalizes the misfit-based characterization of performance gain to broader loss functions and model classes, including non-convex models via convex combinations.
Findings
Misfit characterizes performance gain for Bregman divergences.
Convex combinations of models approximate the ideal gain as the number increases.
Experimental validation on synthetic and real datasets supports theoretical results.
Abstract
The paradigm of weak-to-strong generalization constitutes the training of a strong AI model on data labeled by a weak AI model, with the goal that the strong model nevertheless outperforms its weak supervisor on the target task of interest. For the setting of real-valued regression with the squared loss, recent work quantitatively characterizes the gain in performance of the strong model over the weak model in terms of the misfit between the strong and weak model. We generalize such a characterization to learning tasks whose loss functions correspond to arbitrary Bregman divergences when the strong class is convex. This extends the misfit-based characterization of performance gain in weak-to-strong generalization to classification tasks, as the cross-entropy loss can be expressed in terms of a Bregman divergence. In most practical scenarios, however, the strong model class may not be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage and Signal Denoising Methods · Sparse and Compressive Sensing Techniques
