Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Shobhita Sundaram; John Quan; Ariel Kwiatkowski; Kartik Ahuja; Yann Ollivier; Julia Kempe

arXiv:2601.18778·cs.LG·February 9, 2026

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe

PDF

Open Access

TL;DR

This paper introduces SOAR, a self-improvement framework enabling large language models to generate curricula for themselves, thereby overcoming learning plateaus on difficult reasoning tasks by leveraging latent knowledge and grounded rewards.

Contribution

It demonstrates that bi-level meta-RL with grounded rewards can unlock learning in large models on hard problems without curated data, advancing autonomous model self-improvement.

Findings

01

Grounded rewards outperform intrinsic reward schemes.

02

Models can generate useful stepping stones without solving hard problems.

03

Structural quality of questions is more important than correctness.

Abstract

Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to generate an automated curriculum for problems it cannot solve? To explore this, we design SOAR: A self-improvement framework designed to surface these pedagogical signals through meta-RL. A teacher copy of the model proposes synthetic problems for a student copy, and is rewarded with its improvement on a small subset of hard problems. Critically, SOAR grounds the curriculum in measured student progress rather than intrinsic proxy rewards. Our study on the hardest subsets of mathematical benchmarks (0/128 success) reveals three core findings. First, we show that it is possible to realize bi-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)