TL;DR
This paper introduces a structured approach to agentic reasoning in LLMs, decomposing decision-making into simulative reasoning, self-regulation, and reactive execution, leading to more efficient and effective planning.
Contribution
It proposes a novel three-system framework for agentic reasoning, implemented as SR$^2$AM, demonstrating improved efficiency and planning horizons across diverse tasks.
Findings
Achieves competitive performance with fewer reasoning tokens.
RL training extends planning horizon by 22.8%.
Self-regulation reduces unnecessary planning, improving efficiency.
Abstract
How should an agent decide when and how to plan? A dominant approach builds agents as reactive policies with adaptive computation (e.g., chain-of-thought), trained end-to-end expecting planning to emerge implicitly. Without control over the presence, structure, or horizon of planning, these systems dramatically increase reasoning length, yielding inefficient token use without reliable accuracy gains. We argue efficient agentic reasoning benefits from decomposing decision-making into three systems: simulative reasoning (System II) grounding deliberation in future-state prediction via a world model; self-regulation (System III) deciding when and how deeply to plan via a learned configurator; and reactive execution (System I) handling fine-grained action. Simulative reasoning provides unified planning across diverse tasks without per-domain engineering, while self-regulation ensures the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
