On Logical Extrapolation for Mazes with Recurrent and Implicit Networks

Brandon Knutson; Amandin Chyba Rabeendran; Michael Ivanitskiy; Jordan Pettyjohn; Cecilia Diniz-Behn; Samy Wu Fung; Daniel McKenzie

arXiv:2410.03020·cs.LG·July 21, 2025·2 cites

On Logical Extrapolation for Mazes with Recurrent and Implicit Networks

Brandon Knutson, Amandin Chyba Rabeendran, Michael Ivanitskiy, Jordan Pettyjohn, Cecilia Diniz-Behn, Samy Wu Fung, Daniel McKenzie

PDF

Open Access 1 Repo 1 Video 5 Reviews

TL;DR

This paper investigates how recurrent and implicit neural networks generalize in maze solving, revealing limitations in their extrapolation abilities and analyzing their convergence behaviors to inform better design.

Contribution

It critically examines the assumption that these networks learn scalable algorithms, providing evidence of their approximate learned heuristics and analyzing their dynamic behaviors during extrapolation.

Findings

01

Models fail in various ways when tested on diverse maze data

02

A specific RNN approximately learns 'deadend-filling' heuristic

03

Models trained for convergence tend to do so, while others may exhibit limit cycles

Abstract

Recent work suggests that certain neural network architectures -- particularly recurrent neural networks (RNNs) and implicit neural networks (INNs) -- are capable of logical extrapolation. When trained on easy instances of a task, these networks (henceforth: logical extrapolators) can generalize to more difficult instances. Previous research has hypothesized that logical extrapolators do so by learning a scalable, iterative algorithm for the given task which converges to the solution. We examine this idea more closely in the context of a single task: maze solving. By varying test data along multiple axes -- not just maze size -- we show that models introduced in prior work fail in a variety of ways, some expected and others less so. It remains uncertain whether any of these models has truly learned an algorithm. However, we provide evidence that a certain RNN has approximately learned a…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 4

Strengths

* Logical extrapolation is a key ability of human vision. The authors show that prior work in this area need more exploration and highlight that scaling difficulty in dimensions other than model size hurts the out-of-distribution performance of RNNs and INNs, both of which have been the kind of networks which have shown logical extrapolation. This is a really interesting contribution and warrants the need to further explore this unsolved problem. * This work includes source code for generating t

Weaknesses

* **Loose connection between the TDA analysis and lack of logical extrapolation**: It felt to me that the TDA analysis of RNN/INN latent dynamics is disconnected from the issue of logical extrapolation in RNNs and INNs. My biggest feedback to improve this paper further would be to strengthen the connection between these two explorations in the paper's writing as I believe this would greatly help in gleaning the contributions clearly. * **Overall framing of the story**: The current writing sound

Reviewer 02Rating 3Confidence 3

Strengths

1. The use of topological data analysis is to study sequences of RNN latents is novel, to my best knowledge. 2. The description of the experimental setup is clear and the main points in the paper are conveyed in a way that is easy to understand.

Weaknesses

I believe that the depth of analysis does not meet the bar of an ICLR acceptance. This paper can be seen as a paper about generalization or interpretability. When evaluated as a paper about generalization: 1. The paper observes a lack of generalization but does not provide much surprising insight or propose new techniques to improve generalization. I believe the Bansal work already covers the claim about their model generalizing to increased maze size, so the new claims are about deadend-star

Reviewer 03Rating 5Confidence 3

Strengths

This is an interesting paper. Good analysis of two models in 3 extrapolation dimensions. Good mathematical analysis of the latents of the models. Content is well presented.

Weaknesses

The paper would benefit from going deeper in a few areas. Refer Questions section The PCA/TDA finding of one point, two point, two circles is very interesting. Understanding how these modes relate to model performance, algorithm or similar would extend this finding. Without some implications of this finding, it’s hard to say how important this finding is. Erata: - Page 5. Text “mazes still satisfying this condition contribute” is ambiguous. Consider using “mazes with a start position degree of

Reviewer 04Rating 3Confidence 3

Strengths

**Originality** While the paper considers a task and architecture found in prior works, they conduct new analyses which reveal additional information about the inner mechanics of the model. **Quality** Overall, the experiments are thorough and well-conducted. The authors consider a number of ways to both extrapolate the task and analyze the latent dynamics. **Clarity** Overall, the paper is well-written and the figures are well-illustrated. **Significance** This paper will likely have som

Weaknesses

Overall, the paper seems quite similar to prior work on the maze extrapolation task, and the new analyses do not seem very significant. Certainly, the authors show new results, but the broader significance to extrapolative tasks is not clear. I would encourage the authors to consider at least one other extrapolative task. Another way to improve the significance of the paper is to include theoretical results. Clarity-wise, some text in figures is too small (namely, 2, 3, 4, 5, 6, 7). I encourage

Reviewer 05Rating 3Confidence 3

Strengths

1. This work provides evidence that challenges the claims of previous research by setting up new tests where prior work fell short, underscoring the importance of establishing limitations on earlier findings. 2. The paper is well-structured across its five sections, presented with direct and concise wording, making it easy to read and follow.

Weaknesses

1. The primary reason for rejection is that the paper does not solve or propose any new solutions; it merely highlights where previous work falls short, without suggesting a new model or method to address the identified issues. In my view, experiments without any novel proposals are insufficient for a conference of this caliber. 2. Testing on only 100 mazes (line 237) appears limited, especially given that prior work typically evaluates on a much larger sample size, often 1,000 to 10,000 mazes.

Code & Models

Repositories

mines-opt-ml/maze-extrapolation
pytorchOfficial

Videos

On Logical Extrapolation for Mazes with Recurrent and Implicit Networks· underline

Taxonomy

TopicsAdvanced Algebra and Logic · Logic, Reasoning, and Knowledge · Slime Mold and Myxomycetes Research