Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe

TL;DR
This paper introduces SOAR, a self-improvement framework enabling large language models to generate curricula for themselves, thereby overcoming learning plateaus on difficult reasoning tasks by leveraging latent knowledge and grounded rewards.
Contribution
It demonstrates that bi-level meta-RL with grounded rewards can unlock learning in large models on hard problems without curated data, advancing autonomous model self-improvement.
Findings
Grounded rewards outperform intrinsic reward schemes.
Models can generate useful stepping stones without solving hard problems.
Structural quality of questions is more important than correctness.
Abstract
Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to generate an automated curriculum for problems it cannot solve? To explore this, we design SOAR: A self-improvement framework designed to surface these pedagogical signals through meta-RL. A teacher copy of the model proposes synthetic problems for a student copy, and is rewarded with its improvement on a small subset of hard problems. Critically, SOAR grounds the curriculum in measured student progress rather than intrinsic proxy rewards. Our study on the hardest subsets of mathematical benchmarks (0/128 success) reveals three core findings. First, we show that it is possible to realize bi-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
