$\text{Memory}^3$: Language Modeling with Explicit Memory

Hongkang Yang; Zehao Lin; Wenjin Wang; Hao Wu; Zhiyu Li; Bo Tang,; Wenqiang Wei; Jinbo Wang; Zeyun Tang; Shichao Song; Chenyang Xi; Yu Yu; Kai; Chen; Feiyu Xiong; Linpeng Tang; Weinan E

arXiv:2407.01178·cs.CL·January 29, 2025·1 cites

$\text{Memory}^3$: Language Modeling with Explicit Memory

Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang,, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai, Chen, Feiyu Xiong, Linpeng Tang, Weinan E

PDF

Open Access

TL;DR

This paper introduces $ ext{Memory}^3$, a large language model with explicit external memory that reduces training and inference costs while maintaining or improving performance, inspired by human memory hierarchy.

Contribution

The paper presents a novel LLM architecture with explicit memory, a new memory circuitry theory, and techniques like memory sparsification and a two-stage pretraining scheme.

Findings

01

$ ext{Memory}^3$ outperforms larger LLMs and RAG models in accuracy.

02

It achieves higher decoding speed than RAG models.

03

It demonstrates reduced training and inference costs.

Abstract

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $Memory^{3}$ , since explicit memory is the third form of memory in LLMs after implicit memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Multi-Head Attention · Residual Connection · WordPiece · Softmax · Byte Pair Encoding · Layer Normalization