Deep sequence models tend to memorize geometrically; it is unclear why

Shahriar Noroozizadeh; Vaishnavh Nagarajan; Elan Rosenfeld; Sanjiv Kumar

arXiv:2510.26745·cs.LG·May 19, 2026

Deep sequence models tend to memorize geometrically; it is unclear why

Shahriar Noroozizadeh, Vaishnavh Nagarajan, Elan Rosenfeld, Sanjiv Kumar

PDF

TL;DR

Deep sequence models develop a form of geometric memory that encodes global relationships, enabling complex reasoning tasks to be simplified into easy navigation, which challenges traditional associative memory views.

Contribution

The paper introduces the concept of geometric memory in deep sequence models, contrasting it with associative memory, and analyzes its origins and implications for neural embedding geometries.

Findings

01

Models encode global relationships as geometric memory.

02

Geometric memory simplifies complex reasoning into navigation tasks.

03

Spectral bias contributes to the emergence of geometric memory.

Abstract

Deep sequence models are said to store atomic facts predominantly in the form of associative memory: a brute-force lookup of co-occurring entities. We identify a dramatically different form of storage of atomic facts that we term as geometric memory. Here, the model has synthesized embeddings encoding novel global relationships between all entities, including ones that do not co-occur in training. Such storage is powerful: for instance, we show how it transforms a hard reasoning task involving an $ℓ$ -fold composition into an easy-to-learn $1$ -step navigation task. From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, as against a lookup of local associations, cannot be straightforwardly attributed to typical supervisory, architectural, or optimizational pressures. Counterintuitively,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.