Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Mohammad Taufeeque, Aaron David Tucker, Adam Gleave, Adri\`a Garriga-Alonso

TL;DR
This paper reveals how a convolutional RNN trained with reinforcement learning for Sokoban encodes plans as activations in specific channels and uses learned kernels to perform bidirectional planning and backtracking.
Contribution
It provides a mechanistic understanding of how a model-free reinforcement learning RNN encodes and constructs plans in Sokoban through specialized channels and kernels.
Findings
Path channels encode future move plans.
Kernels encode transition models for position changes.
Backtracking is implemented via negative values in kernels.
Abstract
We partially reverse-engineer a convolutional recurrent neural network (RNN) trained with model-free reinforcement learning to play the box-pushing game Sokoban. We find that the RNN stores future moves (plans) as activations in particular channels of the hidden state, which we call path channels. A high activation in a particular location means that, when a box is in that location, it will get pushed in the channel's assigned direction. We examine the convolutional kernels between path channels and find that they encode the change in position resulting from each possible action, thus representing part of a learned transition model. The RNN constructs plans by starting at the boxes and goals. These kernels extend activations in path channels forwards from boxes and backwards from the goal. Negative values are placed in channels at obstacles. This causes the extension kernels to…
Peer Reviews
Decision·ICLR 2026 Poster
1. Plans live in identifiable channels; no probes needed to find them. 2. Concrete kernels for initialize/extend/stop/compete; elegant WTA tying to action selection. 3. Targeted edits flip actions at ~99% via GNA/PNA; large drops from path-channel ablations. 4. Long/short-term split with j-gate-mediated transfer explains overlapping plans. 5. The “weight steering” generalization and a clean recipe others can reuse.
1. Risk of confirmation bias; limited inter-rater reliability reporting. 2. Define solve-rate protocol, seeds, and statistical tests more precisely; report CIs on all numbers. 3. One architecture (DRC(3,3)), one domain (Sokoban). 4. No check on other DRC sizes, Atari-DRC, or different training runs. 5. Interesting, but current evidence is circumstantial; could be toned down or supported with extra tests (e.g., explicit internal objective readouts). 6. Success metric is “any alternate action
I really do not have much to say. It may have some implications on the RL-trained the language models, especially those that reuse the same layer several times (e.g. diffusion language models, Universal Transformer, Sparse Universal Transformer) and use think tags.
The analysis is performed on a particular set of weights of a particular RNN architecture using a particular dataset, random seed, etc., leaving it unclear whether the findings generalize to wider applications, or even to a different set of weights from a different random seed. (the authors acknowledged this in the appendix) The paper does not have a technical contribution. Rather, the authors simply manually found the existence of path channels, which renders the use of linear probes / logisti
This paper offers a detailed mechanistic analysis of the DRC(3,3) reinforcement learning agent trained on Sokoban. Its originality lies in the depth of interpretability achieved—the authors go beyond linear probes to reveal path channels and plan extension kernels that implement a neural form of bidirectional planning. The study combines qualitative visualization, quantitative ablation, and causal intervention experiments in a careful way. The causal and ablation results convincingly show that
Despite its depth, the paper can be very difficult to read. Readers unfamiliar with DRC(3,3) may struggle to follow the layer/tick conventions and channel indexing. A schematic overview early in the paper would improve clarity. Another issue is limited generality: all findings are restricted to Sokoban. It remains unclear whether the same mechanisms emerge in other planning-heavy environments or different architectures. Some comparative evidence (e.g., other environments) would help validate t
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Artificial Intelligence in Games
