Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL

Wei Yao; Wenkai Yang; Ziqiao Wang; Yankai Lin; Yong Liu

arXiv:2502.11107·cs.LG·May 29, 2025

Revisiting Weak-to-Strong Generalization in Theory and Practice: Reverse KL vs. Forward KL

Wei Yao, Wenkai Yang, Ziqiao Wang, Yankai Lin, Yong Liu

PDF

Open Access

TL;DR

This paper explores the use of reverse KL divergence for weak-to-strong generalization in language models, showing it outperforms forward KL in theory and practice by focusing on high-confidence predictions.

Contribution

It introduces a theoretically grounded approach replacing forward KL with reverse KL divergence, providing tighter bounds and practical improvements in model training.

Findings

01

Reverse KL guarantees outperform weak supervisors when fine-tuning

02

Reverse KL and reverse cross-entropy outperform forward KL in experiments

03

Theoretically, reverse KL offers comparable or better guarantees than forward KL

Abstract

As large language models advance toward superhuman performance, ensuring their alignment with human values and abilities grows increasingly complex. Weak-to-strong generalization offers a promising approach by leveraging predictions from weaker models to guide stronger systems, but its effectiveness could be constrained by the inherent noise and inaccuracies in these weak predictions. To address this, we propose a theoretically grounded approach that replaces forward KL divergence-whose mass-covering behavior risks overfitting to imperfect weak signals-with reverse KL divergence. Reverse KL divergence's zero-forcing effect prioritizes high-confidence predictions, effectively mitigating the influence of unreliable weak supervision. Theoretically, we extend existing bounds and derive tighter lower bounds for both forward and reverse KL divergence, establishing that reverse KL achieves at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications