Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Tianle Wang; Zhaoyang Wang; Guangchen Lan; Xinpeng Wei; Sipeng Zhang; Guanwen Qiu; Abulhair Saparov

arXiv:2605.06638·cs.AI·May 19, 2026

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Tianle Wang, Zhaoyang Wang, Guangchen Lan, Xinpeng Wei, Sipeng Zhang, Guanwen Qiu, Abulhair Saparov

PDF

1 Repo

TL;DR

This paper introduces ScaleLogic, a synthetic framework to study how training scale and logical expressiveness affect LLM reasoning, showing that expressiveness significantly impacts training efficiency and reasoning capabilities.

Contribution

The paper presents ScaleLogic, a scalable logical reasoning environment that systematically explores the effects of reasoning depth and expressiveness on LLM training and transfer.

Findings

01

Training compute follows a power law with reasoning depth, with the exponent increasing with logic expressiveness.

02

More expressive training improves downstream performance and transfer efficiency.

03

The power-law relationship holds across multiple RL methods and benefits from curriculum training.

Abstract

Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. Observed LLM shortcomings in long-horizon reasoning have raised the prospect that they are fundamental to the autoregressive transformer architecture. To address this, we introduce ScaleLogic, a synthetic logical reasoning framework that offers independent control over two axes of difficulty: the depth of the required proof planning (i.e., the horizon) and the expressiveness of the underlying logic. Our proposed framework supports a wide range of logics: from simple implication-only logic ("if-then") towards more expressive first-order reasoning with conjunction ("and"), disjunction ("or"), negation ("not"), and universal quantification ("for all"). Using this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wtl666wtl/ScaleLogic
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.