Incentivizing Strong Reasoning from Weak Supervision

Yige Yuan; Teng Xiao; Shuchang Tao; Xue Wang; Jinyang Gao; Bolin Ding; Bingbing Xu

arXiv:2505.20072·cs.CL·March 17, 2026

Incentivizing Strong Reasoning from Weak Supervision

Yige Yuan, Teng Xiao, Shuchang Tao, Xue Wang, Jinyang Gao, Bolin Ding, Bingbing Xu

PDF

Open Access 1 Repo 2 Videos

TL;DR

This paper proposes a cost-effective method to enhance large language models' reasoning abilities by using supervision from weaker models, achieving near the performance of expensive reinforcement learning techniques.

Contribution

It introduces a novel weak supervision approach to incentivize reasoning in LLMs, reducing reliance on costly demonstrations and reinforcement learning.

Findings

01

Weak supervision from weaker models significantly improves reasoning performance.

02

The approach recovers up to 94% of the gains of reinforcement learning.

03

Effective across diverse benchmarks and model architectures.

Abstract

Large language models (LLMs) have demonstrated impressive performance on reasoning-intensive tasks, but enhancing their reasoning abilities typically relies on either reinforcement learning (RL) with verifiable signals or supervised fine-tuning (SFT) with high-quality long chain-of-thought (CoT) demonstrations, both of which are expensive. In this paper, we study a novel problem of incentivizing the reasoning capacity of LLMs without expensive high-quality demonstrations and reinforcement learning. We investigate whether the reasoning capabilities of LLMs can be effectively incentivized via supervision from significantly weaker models. We further analyze when and why such weak supervision succeeds in eliciting reasoning abilities in stronger models. Our findings show that supervision from significantly weaker reasoners can substantially improve student reasoning performance, recovering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

w2sr-neurips/code
noneOfficial

Videos

Incentivizing Strong Reasoning from Weak Supervision· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning