Transformers are Stateless Differentiable Neural Computers
Bo Tang, Weiwei Xie

TL;DR
This paper demonstrates that causal Transformers are equivalent to stateless Differentiable Neural Computers, providing a unified, memory-centric theoretical framework for understanding Transformer architectures and their relation to neural computer models.
Contribution
The paper formally derives the equivalence between causal Transformers and sDNCs, extending this to encoder-decoder models, and offers a unified memory-based interpretation of Transformers.
Findings
Transformers are mathematically equivalent to stateless DNCs.
Encoder-decoder Transformers correspond to sDNCs with separate memories.
Provides a principled, memory-centric framework for understanding large language models.
Abstract
Differentiable Neural Computers (DNCs) were introduced as recurrent architectures equipped with an addressable external memory supporting differentiable read and write operations. Transformers, in contrast, are nominally feedforward architectures based on multi-head self-attention. In this work we give a formal derivation showing that a causal Transformer layer is exactly a stateless Differentiable Neural Computer (sDNC) where (1) the controller has no recurrent internal state, (2) the external memory is a write-once matrix of value vectors, (3) content-based addressing via keys implements attention, and (4) multi-head attention corresponds to multiple parallel read heads. We further extend this equivalence to cross-attention, showing that encoder-decoder Transformers are precisely sDNCs with distinct read-from and write-to memories. Our results provide a unified memory-centric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Neural Networks and Applications
