Characterizing Verbatim Short-Term Memory in Neural Language Models

Kristijan Armeni; Christopher Honey; Tal Linzen

arXiv:2210.13569·cs.CL·May 3, 2023·1 cites

Characterizing Verbatim Short-Term Memory in Neural Language Models

Kristijan Armeni, Christopher Honey, Tal Linzen

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper investigates how transformer and LSTM language models retrieve prior context, revealing transformers' ability to precisely recall exact words and order, while LSTMs show more limited, coarse retrieval.

Contribution

The study demonstrates that transformers function as a flexible working memory system capable of precise token retrieval, unlike LSTMs which maintain a less detailed semantic gist.

Findings

01

Transformers retrieve exact words and order from prior context.

02

LSTMs show limited, less precise retrieval focused on early tokens.

03

Retrieval ability in transformers improves with larger training data and depth.

Abstract

When a language model is trained to predict natural language sequences, its prediction at each moment depends on a representation of prior context. What kind of information about the prior context can language models retrieve? We tested whether language models could retrieve the exact words that occurred previously in a text. In our paradigm, language models (transformers and an LSTM) processed English text in which a list of nouns occurred twice. We operationalized retrieval as the reduction in surprisal from the first to the second list. We found that the transformers retrieved both the identity and ordering of nouns from the first list. Further, the transformers' retrieval was markedly enhanced when they were trained on a larger corpus and with greater model depth. Lastly, their ability to index prior tokens was dependent on learned attention patterns. In contrast, the LSTM exhibited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kristijanarmeni/verbatim-memory-in-nlms
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory