From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
Amir Tahmasbi, Sadegh Majidi, Kazem Taram, Aniket Bera

TL;DR
This paper introduces a two-stage method combining supervised fine-tuning and reinforcement learning to improve multi-step spatial reasoning in large language models, demonstrating superior performance and stability in puzzle environments.
Contribution
The paper presents a novel two-stage approach that decomposes spatial reasoning into atomic blocks and their composition, enhancing LLMs' multi-step planning abilities.
Findings
Outperforms baseline models in spatial reasoning tasks
Faster convergence and more stable training than end-to-end RL
Attention analysis shows improved spatial understanding
Abstract
Spatial reasoning in large language models (LLMs) has gained increasing attention due to applications in navigation and planning. Despite strong general language capabilities, LLMs still struggle with spatial transformations and multi-step planning in structured environments. We propose a two-stage approach that decomposes spatial reasoning into atomic building blocks and their composition. First, we apply supervised fine-tuning on elementary spatial transformations, such as rotation, translation, and scaling, to equip the model with basic spatial physics. We then freeze this physics-aware model and train lightweight LoRA adapters within the GRPO framework to learn policies that compose these building blocks for multi-step planning in puzzle-based environments, in a closed-loop manner. To support this pipeline, we synthesize an ASCII-art dataset and construct a corresponding ASCII-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · AI-based Problem Solving and Planning
