Planning in a recurrent neural network that plays Sokoban
Mohammad Taufeeque, Philip Quirke, Maximilian Li, Chris Cundy, Aaron David Tucker, Adam Gleave, Adri\`a Garriga-Alonso

TL;DR
This paper investigates how a recurrent neural network trained on Sokoban internally represents planning, revealing causal plan structures, pacing behaviors, and extending capabilities to larger puzzles, thus advancing understanding of neural planning mechanisms.
Contribution
It uncovers the internal causal plan representations in an RNN trained on Sokoban and demonstrates how these representations enable solving larger, out-of-distribution puzzles.
Findings
RNN predicts actions 50 steps ahead
The quality of planning increases over initial steps
The RNN exhibits pacing behavior incentivized by training
Abstract
Planning is essential for solving complex tasks, yet the internal mechanisms underlying planning in neural networks remain poorly understood. Building on prior work, we analyze a recurrent neural network (RNN) trained on Sokoban, a challenging puzzle requiring sequential, irreversible decisions. We find that the RNN has a causal plan representation which predicts its future actions about 50 steps in advance. The quality and length of the represented plan increases over the first few steps. We uncover a surprising behavior: the RNN "paces" in cycles to give itself extra computation at the start of a level, and show that this behavior is incentivized by training. Leveraging these insights, we extend the trained RNN to significantly larger, out-of-distribution Sokoban puzzles, demonstrating robust representations beyond the training regime. We open-source our model and code, and believe…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The current manuscript is highly informative a number of interesting analysis, many of which are novel to my knowledge (beyond those discussed in Guez et al. 2019). 2. Some of the results are thought-involking, which I have not thought about, such as the "pacing" mechanism. 3. The being touched task (planning in Sokoban) is quite interesting (while simple enough for humans to interpret) and potentially important for understanding planning.
While the current study presents a number of interesting results. I believe the manuscript could be improved in several aspects. 1. The paper could benefit from a better struture and more scientific style of writting. The current version, while informative, is not self-contained and often makes me feel difficult to understand the implementation details of each analyze method. The authors should assume minimal pre-knowledge of the methods used in this paper, for a more general audience. 2. The p
The paper explores an interesting research direction on how a neural network within an emobied agent abstract their problem at hand, projects a plan and makes the decision making. It is an interesting problem that trhough a sound study can be very useful and benefitial for the community
While the line of research is of interest, the current paper has important shortcomings that require significant changes. Specifically, there are three major weaknesses in this paper: * Clarity: As I will detail below the paper is very difficult to understand and it would greatly benefit from a rewritting. Specifically, I would suggest authors to do a simple exercise at the beggining what are the hypothesis that are being tested, how this thesis is going to be tested and what results would be
- **Nice task.** I think the use of Sokoban is interesting as it is a hard planning problem that has many failure modes for greedy models which perform actions without careful regard and deliberation of future actions. I think research that interprets how models perform planning is especially relevant now with current developments happening in the reasoning abilities of large (language) models. - **A variety of interpretability techniques.** The authors make use of a variety of techniques of m
Despite these strengths I think the paper has many weaknesses and does not do a great job of elucidating if the model is planning. In addition, I think the paper is very unclear in many places. - **Framing of results.** The paper is abundant with sections in which the authors make quite substantial and large claims. I.e. in the introduction claims that “linear probes show that activations represent a plan that causes actions” and that “the behaviour of the RNN suggests that it performs search”
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
