Does Engram Do Memory Retrieval in Autoregressive Image Generation?
Jinghao Wang, Qiyuan He, Chunbin Gu, Pheng-Ann Heng

TL;DR
This paper investigates whether the Engram module in autoregressive image generation acts as a content-addressed memory or as a gated architectural component, finding it functions mainly as a residual pathway rather than a recall mechanism.
Contribution
The study adapts the Engram module for vision, evaluates its role in image generation, and demonstrates it behaves as a gated pathway rather than a content-addressed memory.
Findings
Engram-augmented models do not improve sample quality over baselines.
Disabling the Engram pathway is catastrophic, but a small constant gate suffices.
Frozen random memory tables have minimal impact on FID and can improve Inception Score.
Abstract
The Engram module -- a hash-keyed, O(1) associative memory injected into Transformer layers -- was recently shown to improve large language model pretraining, with the appealing interpretation that it provides a content-addressed shortcut to recurring local token patterns. We ask whether this interpretation transfers to autoregressive (AR) image generation, or whether the observed gains, if any, come from a different mechanism. We adapt the Engram module to vision with 2D spatial -gram hashing, gated fusion, and KV-cache-compatible incremental inference, and inject it into a class-conditional AR generator trained on ImageNet 256x256. Across a sweep of backbone-to-memory budget ratios , every Engram-augmented variant trails the pure AR baseline in FID, indicating that the module saves backbone FLOPs but does not, by itself, improve sample quality. We then probe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
