Graph Memory Transformer (GMT)

Nicola Zanarini; Niccol\`o Ferrari

arXiv:2604.23862·cs.LG·April 28, 2026

Graph Memory Transformer (GMT)

Nicola Zanarini, Niccol\`o Ferrari

PDF

1 Repo 1 Models

TL;DR

This paper introduces Graph Memory Transformer (GMT), replacing dense feed-forward layers with a learned memory graph to enhance interpretability while maintaining language modeling capabilities.

Contribution

It proposes a novel transformer architecture that replaces dense FFN layers with a graph-based memory cell, enabling structural interpretability.

Findings

01

GMT trains stably with fewer parameters than baseline.

02

It exposes centroid usage and transition structure for interpretability.

03

Performance is close to dense baseline on zero-shot benchmarks.

Abstract

We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memory Transformer (GMT) keeps causal self-attention intact, but replaces the usual per-token FFN transformation with a memory cell that routes token representations over a learned bank of centroids connected by a learned directed transition matrix. In the base GMT v7 instantiation studied here, each of 16 transformer blocks contains 128 centroids, a 128 * 128 edge matrix, gravitational source routing, token-conditioned target selection, and a gated displacement readout. The cell therefore returns movement from an estimated source memory state toward a target memory state, rather than a retrieved value. The resulting model is a fully decoder-only language model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nemesis533/GMT-GraphMemoryTransformer
github

Models

🤗
NicolaZanarini/gmt-v7-base
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.