Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Ian Wu; Yuxiao Qu; Amrith Setlur; Aviral Kumar

arXiv:2602.03773·cs.LG·March 24, 2026

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Ian Wu, Yuxiao Qu, Amrith Setlur, Aviral Kumar

PDF

Open Access 3 Models 2 Datasets

TL;DR

This paper introduces RC, an iterative decoding method enabling LLMs to continually improve reasoning over long horizons, significantly enhancing performance on complex tasks beyond training constraints.

Contribution

The paper presents RC, a novel iterative decoding algorithm that allows LLMs to extrapolate and improve reasoning capabilities over much longer horizons than previously possible.

Findings

01

Models trained with RC outperform baseline models on reasoning tasks.

02

RC enables models to extrapolate reasoning beyond training horizons.

03

Training with RC improves the effective use of scaffolds for better performance.

Abstract

Large Language Models (LLMs) that can continually improve beyond their training budgets are able to solve increasingly difficult problems by adapting at test time, a property we refer to as extrapolation. However, standard reinforcement learning (RL) operates over fixed problem distributions and training budgets, which limits extrapolation amidst distribution shift at test time. To address this, we introduce RC, an iterative decoding algorithm that replaces standard autoregressive decoding during both training and inference. RC exploits an asymmetry between the response generation and summarization capabilities of LLMs to construct reasoning chains that consistently improve across iterations. Models trained to use RC can extrapolate and continually improve over reasoning horizons more than an order of magnitude longer than those seen during training. Empirically, training a 4B model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques