Building spatial world models from sparse transitional episodic memories
Zizhan He, Maxime Daigle, Pouya Bashivan

TL;DR
This paper introduces ESWM, a novel model that constructs flexible spatial maps from sparse episodic memories, enabling rapid adaptation and efficient navigation without extensive training.
Contribution
The paper presents ESWM, a new framework that builds spatial world models from disjoint episodic memories, inspired by neuroscience, allowing quick environmental adaptation and planning.
Findings
ESWM predicts unobserved transitions from minimal experience.
The latent space geometry aligns with environment structure.
Enables near-optimal exploration and navigation strategies.
Abstract
Many animals possess a remarkable capacity to rapidly construct flexible cognitive maps of their environments. These maps are crucial for ethologically relevant behaviors such as navigation, exploration, and planning. Existing computational models typically require long sequential trajectories to build accurate maps, but neuroscience evidence suggests maps can also arise from integrating disjoint experiences governed by consistent spatial rules. We introduce the Episodic Spatial World Model (ESWM), a novel framework that constructs spatial maps from sparse, disjoint episodic memories. Across environments of varying complexity, ESWM predicts unobserved transitions from minimal experience, and the geometry of its latent space aligns with that of the environment. Because it operates on episodic memories that can be independently stored and updated, ESWM is inherently adaptive, enabling…
Peer Reviews
Decision·ICLR 2026 Poster
1. The empirical evaluation is thorough, rigorous, and insightful. The authors test ESWM on multiple tasks and environment variants (open arenas vs. maps with random walls/obstacles) and over multiple base models (transformers, LSTM, and Mamba) to assess its prediction accuracy, representation quality, and control performance. The set of experiments covers many insightful explorations of the ESWM behavior, including memory integration, navigation, adaptation to environment change, and scaling. O
1. The innovation is a bit limited, and there is no baseline outside of the ESWM model to compare with, making it hard to ground the model properties in context. Given the nature of the base models and the environment, the key innovation seems to be the training scheme over multiple memory banks. On the grounding, it would be interesting to see how the model performs against ordinary baselines or training schemes. For example, one may compare ESWM-T with ordinary transformers (with a single memo
- **Important Research Question and Interdisciplinary Connection:** This paper addresses a significant limitation of prior world models: their difficulty in inferring robust spatial knowledge from continuous sequences and adapting to environmental changes. The authors propose the Episodic Spatial World Model (ESWM), a novel approach inspired by the dual role of the Medial Temporal Lobe (MTL) in processing both episodic and spatial memory. This represents an excellent example of leveraging insigh
- **Insufficient Discussion of Closely Related Work and Unclear Novelty:** The core mechanism of the proposed method shares significant similarities with Generative Temporal Models with Spatial Memory (GTM-SM) [1], particularly the idea of leveraging one-step transitions to build a spatially-aware model. GTM-SM also demonstrated that its inferred state representations align with the true geometry and enable high-quality long-term predictions (e.g., Figure 2 in [1]). However, this paper lacks any
1. Well-documented implementation: The paper provides comprehensive technical details including pseudocode (Algorithms 1-4), extensive appendices, and thorough experimental setup descriptions. This supports reproducibility. 2. Strong downstream performance: ESWM demonstrates impressive zero-shot capabilities on exploration (96.48% of oracle performance) and navigation tasks (96.8% success rate, 99.2% path optimality), outperforming the task-specific EPN baseline by substantial margins. 3. Intere
## Major Concerns 1. Insufficient Justification of Core Premise. The abstract and introduction claim that animals/humans build spatial maps from "disjoint experiences governed by consistent spatial rules," but no neuroscience evidence is provided to support this specific claim. The cited disruption studies (lesions, amnesia) show that MTL is important for spatial cognition and episodic memory, but don't demonstrate that humans actually construct maps from genuinely disjoint, fragmented experienc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies
