EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles

Aakriti Agrawal; Mucong Ding; Zora Che; Chenghao Deng; Anirudh Satheesh; Bang An; Bayan Bruss; John Langford; Furong Huang

arXiv:2505.21959·cs.LG·June 6, 2025

EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles

Aakriti Agrawal, Mucong Ding, Zora Che, Chenghao Deng, Anirudh Satheesh, Bang An, Bayan Bruss, John Langford, Furong Huang

PDF

Open Access

TL;DR

This paper introduces EnsemW2S, a novel ensemble method that improves weak language models' ability to generalize to complex tasks and supervise stronger models, especially under distributional shifts, with significant empirical gains.

Contribution

EnsemW2S employs a token-level ensemble strategy to iteratively enhance weak experts, enabling better supervision of strong models on both in-distribution and out-of-distribution data.

Findings

01

Achieved up to 6% improvement on OOD datasets.

02

Enhanced weak experts' performance by 4% on ID datasets.

03

Demonstrated effective supervision of strong models with ensemble weak experts.

Abstract

With Large Language Models (LLMs) rapidly approaching and potentially surpassing human-level performance, it has become imperative to develop approaches capable of effectively supervising and enhancing these powerful models using smaller, human-level models exposed to only human-level data. We address this critical weak-to-strong (W2S) generalization challenge by proposing a novel method aimed at improving weak experts, by training on the same limited human-level data, enabling them to generalize to complex, super-human-level tasks. Our approach, called \textbf{EnsemW2S}, employs a token-level ensemble strategy that iteratively combines multiple weak experts, systematically addressing the shortcomings identified in preceding iterations. By continuously refining these weak models, we significantly enhance their collective ability to supervise stronger student models. We extensively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications