Schrodinger's Memory: Large Language Models

Wei Wang; Qing Li

arXiv:2409.10482·cs.CL·September 30, 2024

Schrodinger's Memory: Large Language Models

Wei Wang, Qing Li

PDF

Open Access 3 Reviews

TL;DR

This paper explores the memory mechanisms of Large Language Models (LLMs) using the Universal Approximation Theorem, proposing a new perspective that likens LLM memory to Schr"odinger's memory, and compares it with human memory.

Contribution

It introduces a theoretical explanation of LLM memory based on UAT and proposes a novel method to assess their memory capabilities, framing LLM memory as observable only upon querying.

Findings

01

LLMs exhibit memory that can be tested through specific queries.

02

The proposed assessment method effectively measures LLM memory capabilities.

03

LLM memory operates similarly to Schr"odinger's memory, being indeterminate until observed.

Abstract

Memory is the foundation of all human activities; without memory, it would be nearly impossible for people to perform any task in daily life. With the development of Large Language Models (LLMs), their language capabilities are becoming increasingly comparable to those of humans. But do LLMs have memory? Based on current performance, LLMs do appear to exhibit memory. So, what is the underlying mechanism of this memory? Previous research has lacked a deep exploration of LLMs' memory capabilities and the underlying theory. In this paper, we use Universal Approximation Theorem (UAT) to explain the memory mechanism in LLMs. We also conduct experiments to verify the memory capabilities of various LLMs, proposing a new method to assess their abilities based on these memory ability. We argue that LLM memory operates like Schr\"odinger's memory, meaning that it only becomes observable when a…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 1Confidence 4

Strengths

The paper addresses an important topic. It is crucial that we study in detail the similarities and differences in memory processes between humans and LLMs, as well as devote plenty of effort do understand how memory manifests in transformer architectures. The paper is relatively clearly written. The work has both a theoretical and a practical component. It is always commendable when practical experiments are informed by theory.

Weaknesses

** Novelty ** Unfortunately, the paper is not sufficiently novel to pass the high standards of the ICLR conference. The main empirical result in the paper is a rediscovery of a well-known phenomenon of overfitting. As is well-known, LLMs do indeed possess a remarkable ability for verbatim memorization of items in the training set. **Quality** Unfortunately, the quality of the literature review or the theoretical justification of the proposed work is not sufficient to pass the high standards of

Reviewer 02Rating 1Confidence 4

Strengths

The general idea of connecting deep theory about functions to concrete Transformer-based LLMs (for example, via the Universal Approximation Theorem) is interesting and promising.

Weaknesses

Unfortunately, the article suffers from several shortcomings that I will point out section-wise: Introduction: Overall, the main topic of the article should be clarified. There needs to be a proper section on related work to outline and delineate current research on this topic so that readers know what the state-of-the-art is and why this paper's contributions are contributions in the first place. This would also help readers unfamiliar with the topic to navigate the article more efficiently

Reviewer 03Rating 1Confidence 5

Strengths

The paper investigates memory in LLMs, which is timely given the current explosion of research on LMs, and is thematically aligned with ICLR. The decision to test memorization on poetry is a step in the right direction.

Weaknesses

While the experimental work is a step in the right direction, I do not believe the paper in its current form would be a good fit for ICLR. 1. The authors aim to provide a precise definition of memory, yet the definition they provide is informal, and the only other definition they compare to is one from Wikipedia (ignoring the vast literature on, e.g., dense associate memory, episodic vs working memory etc). 1. While the goal of Section 3.1 is to define memory, the definition provided is impre

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods