Selective Weak-to-Strong Generalization
Hao Lang, Fei Huang, Yongbin Li

TL;DR
This paper introduces a selective weak-to-strong generalization framework that improves model alignment by selectively using weak supervision, employing a classifier to identify answerable questions, and refining labels with graph smoothing, leading to better robustness and performance.
Contribution
It proposes a novel selective W2SG approach that avoids unnecessary weak supervision, enhancing robustness and generalization in superhuman model alignment.
Findings
Outperforms baseline methods on three benchmarks
Classifier P(IK) generalizes across tasks and difficulties
Refined labels improve model robustness
Abstract
Future superhuman models will surpass the ability of humans and humans will only be able to \textit{weakly} supervise superhuman models. To alleviate the issue of lacking high-quality data for model alignment, some works on weak-to-strong generalization (W2SG) finetune a strong pretrained model with a weak supervisor so that it can generalize beyond weak supervision. However, the invariable use of weak supervision in existing methods exposes issues in robustness, with a proportion of weak labels proving harmful to models. In this paper, we propose a selective W2SG framework to avoid using weak supervision when unnecessary. We train a binary classifier P(IK) to identify questions that a strong model can answer and use its self-generated labels for alignment. We further refine weak labels with a graph smoothing method. Extensive experiments on three benchmarks show that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Topic Modeling
