TL;DR
This paper introduces ScaleLogic, a synthetic framework to study how training scale and logical expressiveness affect LLM reasoning, showing that expressiveness significantly impacts training efficiency and reasoning capabilities.
Contribution
The paper presents ScaleLogic, a scalable logical reasoning environment that systematically explores the effects of reasoning depth and expressiveness on LLM training and transfer.
Findings
Training compute follows a power law with reasoning depth, with the exponent increasing with logic expressiveness.
More expressive training improves downstream performance and transfer efficiency.
The power-law relationship holds across multiple RL methods and benefits from curriculum training.
Abstract
Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. Observed LLM shortcomings in long-horizon reasoning have raised the prospect that they are fundamental to the autoregressive transformer architecture. To address this, we introduce ScaleLogic, a synthetic logical reasoning framework that offers independent control over two axes of difficulty: the depth of the required proof planning (i.e., the horizon) and the expressiveness of the underlying logic. Our proposed framework supports a wide range of logics: from simple implication-only logic ("if-then") towards more expressive first-order reasoning with conjunction ("and"), disjunction ("or"), negation ("not"), and universal quantification ("for all"). Using this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
