Self-Attention Limits Working Memory Capacity of Transformer-Based Models
Dongyu Gong, Hantao Zhang

TL;DR
This paper investigates how the self-attention mechanism in Transformer models limits their working memory capacity, drawing parallels with human cognition, and provides insights for designing models with improved memory capabilities.
Contribution
It demonstrates that attention score dispersion correlates with memory limits and offers a mechanistic explanation inspired by behavioral science theories.
Findings
Attention scores aggregate over training on N-back tasks.
Attention score entropy increases with N, indicating dispersion.
Self-attention may inherently constrain working memory in Transformers.
Abstract
Recent work on Transformer-based large language models (LLMs) has revealed striking limits in their working memory capacity, similar to what has been found in human behavioral studies. Specifically, these models' performance drops significantly on N-back tasks as N increases. However, there is still a lack of mechanistic interpretability as to why this phenomenon would arise. Inspired by the executive attention theory from behavioral sciences, we hypothesize that the self-attention mechanism within Transformer-based models might be responsible for their working memory capacity limits. To test this hypothesis, we train vanilla decoder-only transformers to perform N-back tasks and find that attention scores gradually aggregate to the N-back positions over training, suggesting that the model masters the task by learning a strategy to pay attention to the relationship between the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need
