Memory Mosaics

Jianyu Zhang; Niklas Nolte; Ranajoy Sadhukhan; Beidi Chen; L\'eon; Bottou

arXiv:2405.06394·cs.LG·March 3, 2025

Memory Mosaics

Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan, Beidi Chen, L\'eon, Bottou

PDF

Open Access 1 Repo 3 Reviews

TL;DR

Memory Mosaics are transparent associative memory networks that perform comparably or better than transformers on language tasks, offering compositional and in-context learning abilities with clearer interpretability.

Contribution

This paper introduces Memory Mosaics, a novel associative memory network architecture that achieves transformer-like capabilities with greater transparency.

Findings

01

Memory Mosaics perform as well or better than transformers on language modeling.

02

They demonstrate compositional and in-context learning capabilities.

03

The approach offers more transparent interpretability compared to transformers.

Abstract

Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent way ("predictive disentanglement"). We illustrate these capabilities on a toy example and also show that memory mosaics perform as well or better than transformers on medium-scale language modeling tasks.

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 3Confidence 3

Strengths

1. The integration of associative memories to replicate and surpass the capabilities of transformers. 2. The concept of predictive disentanglement is novel and also rooted in a solid theoretical framework. 3. The theoretical motivations behind predictive disentanglement are well-explained.

Weaknesses

1. The paper does not provide exhaustive details on the architecture's configuration. 2. Lack of detailed discussion on the choice and impact of hyperparameters. 3.The experimental validation is limited to certain language tasks. 4. Lack of Ablation Studies

Reviewer 02Rating 8Confidence 4

Strengths

In general, this is a though-provoking paper with some interesting ideas. Exploration of the relationship between associative memories, attention, and Transformer blocks is valuable, although the current presentation is heavily biased, and omits related ideas from prior work. The empirical results on language modeling are encouraging, although small scale.

Weaknesses

One problem with this submission is that the presentation almost entirely ignores the work on modern Hopfield networks and dense associative memories, which tackles closely related motivation and ideas. Specifically, the authors’ proposal is closely related to [Energy Transformer (NeurIPS 2024)](https://proceedings.neurips.cc/paper_files/paper/2023/file/57a9b97477b67936298489e3c1417b0a-Paper-Conference.pdf) and related literature, which replaces elements of Transformer block with associative mem

Reviewer 03Rating 8Confidence 4

Strengths

The paper presents an interesting, seemingly novel architecture for a prediction model based on associative memories. The architecture seems well justified from first principles and links nicely with existing works on the attention mechanism. This work is of good quality and presents its ideas reasonably well, including rigorous formalizations and numerous figures explaining the proposed architectures. The paper benchmarks the Memory Mosaic against a toy dataset to demonstrate properties of the

Weaknesses

To my reading, the architecture is not differentiated strongly enough from existing, similar work on associative memories and the attention mechanism. The key property of the architecture, "peeking" in the value predictions, is not explained clearly enough for a reader to understand the significance. The meta-learning interpretation in Section 3 is somewhat confusing in how this relates to the broader field of meta-learning. The toy dataset, while very useful in understanding "predictive disenta

Code & Models

Repositories

facebookresearch/MemoryMosaics
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMemory, Trauma, and Commemoration