WST: Weak-to-Strong Knowledge Transfer via Reinforcement Learning
Haosen Ge, Shuo Li, Lianghuan Huang

TL;DR
WST introduces an efficient reinforcement learning-based framework where small 'Teacher' models generate prompts to significantly improve the performance of larger 'Student' models across reasoning and alignment tasks, without needing large models to be open-source or fine-tuned.
Contribution
The paper proposes WST, a novel weak-to-strong knowledge transfer method that uses reinforcement learning to automatically generate prompts, enabling small models to effectively enhance larger models' performance.
Findings
Achieves 98% on MATH-500 benchmark
Achieves 134% on HH-RLHF benchmark
Outperforms baselines like GPT-4o-mini and Llama-70B
Abstract
Effective prompt engineering remains a challenging task for many applications. We introduce Weak-to-Strong Transfer (WST), an automatic prompt engineering framework where a small "Teacher" model generates instructions that enhance the performance of a much larger "Student" model. Unlike prior work, WST requires only a weak teacher, making it efficient and broadly applicable in settings where large models are closed-source or difficult to fine-tune. Using reinforcement learning, the Teacher Model's instructions are iteratively improved based on the Student Model's outcomes, yielding substantial gains across reasoning (MATH-500, GSM8K) and alignment (HH-RLHF) benchmarks - 98% on MATH-500 and 134% on HH-RLHF - and surpassing baselines such as GPT-4o-mini and Llama-70B. These results demonstrate that small models can reliably scaffold larger ones, unlocking latent capabilities while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
