Reinforced Context Order Recovery for Adaptive Reasoning and Planning

Long Ma; Fangwei Zhong; Yizhou Wang

arXiv:2508.13070·cs.CL·August 19, 2025

Reinforced Context Order Recovery for Adaptive Reasoning and Planning

Long Ma, Fangwei Zhong, Yizhou Wang

PDF

Open Access

TL;DR

This paper introduces ReCOR, a reinforcement learning framework that adaptively determines token generation order in language models, improving reasoning and planning tasks beyond fixed-order approaches.

Contribution

ReCOR is a novel reinforcement learning method that extracts data-dependent token orders without annotations, enhancing model performance on complex reasoning tasks.

Findings

01

ReCOR outperforms baseline models on reasoning and planning datasets.

02

ReCOR sometimes surpasses oracle models with ground-truth order.

03

Adaptive token ordering improves model tractability in complex tasks.

Abstract

Modern causal language models, followed by rapid developments in discrete diffusion models, can now produce a wide variety of interesting and useful content. However, these families of models are predominantly trained to output tokens with a fixed (left-to-right) or random order, which may deviate from the logical order in which tokens are generated originally. In this paper, we observe that current causal and diffusion models encounter difficulties in problems that require adaptive token generation orders to solve tractably, which we characterize with the $V$ -information framework. Motivated by this, we propose Reinforced Context Order Recovery (ReCOR), a reinforcement-learning-based framework to extract adaptive, data-dependent token generation orders from text data without annotations. Self-supervised by token prediction statistics, ReCOR estimates the hardness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · AI-based Problem Solving and Planning · Reinforcement Learning in Robotics