LASAR: Towards Spatio-temporal Reasoning with Latent Cognitive Map

Jinzhou Tang; Sidi Liu; Waikit Xiu; Weixing Chen; Keze Wang

arXiv:2605.16899·cs.CV·May 19, 2026

LASAR: Towards Spatio-temporal Reasoning with Latent Cognitive Map

Jinzhou Tang, Sidi Liu, Waikit Xiu, Weixing Chen, Keze Wang

PDF

TL;DR

LASAR introduces a dual-memory architecture with a contrastive learning objective to enhance spatio-temporal reasoning and internal spatial modeling in embodied AI agents, improving zero-shot generalization.

Contribution

The paper proposes LASAR, a novel architecture with a dual-memory system and ST-CRL training to explicitly encode spatial relationships from experiences.

Findings

01

Achieves 2-3.5% improvements in zero-shot generalization on VLN-CE and VSI-Bench.

02

Demonstrates high self-consistency of the cognitive map.

03

Introduces a contrastive learning method leveraging spatio-temporal cues.

Abstract

A fundamental challenge in embodied AI is verifying if agents build internal models of spatial structure or merely learn to mimic task-specific expert trajectories. This is critical as foundational approaches rooted in action-centric tasks (e.g., VLN) and reasoning-centric tasks (e.g., EQA) often share a common limitation: they lack a learning signal that forces them to encode fine-grained spatial relationships (like topology or distance) over long-range, fragmented experiences. To address this, we first propose LASAR, an architecture featuring a dual-memory system designed to maintain both episodic experiences and a semantic cognitive map. We then introduce Spatio-temporal Contextual Representation Learning (ST-CRL), a contrastive objective designed to train this architecture. ST-CRL leverages spatio-temporal cues from cognitive queries generated through annotated spatio-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.