TL;DR
This paper reveals that transformer feed-forward layers function as key-value memories, with interpretable patterns and output distributions that are refined through the model's layers, highlighting their crucial role in language modeling.
Contribution
It introduces the novel perspective that feed-forward layers act as key-value memories, providing interpretability and insight into their role in transformer models.
Findings
Feed-forward layers operate as key-value memories.
Lower layers capture shallow patterns; upper layers learn semantic ones.
Layer outputs are composed of memories refined through residual connections.
Abstract
Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary. Our experiments show that the learned patterns are human-interpretable, and that lower layers tend to capture shallow patterns, while upper layers learn more semantic ones. The values complement the keys' input patterns by inducing output distributions that concentrate probability mass on tokens likely to appear immediately after each pattern, particularly in the upper layers. Finally, we demonstrate that the output of a feed-forward layer is a composition of its memories, which is subsequently refined throughout the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
