Relating transformers to models and neural representations of the hippocampal formation
James C.R. Whittington, Joseph Warren, Timothy E.J. Behrens

TL;DR
This paper demonstrates that transformer neural networks with recurrent position encodings can replicate hippocampal spatial representations like place and grid cells, bridging artificial and biological neural models and offering performance improvements.
Contribution
It shows that transformers can model hippocampal spatial representations and outperform neuroscience-based models, linking artificial neural networks with brain function.
Findings
Transformers with recurrent position encodings replicate hippocampal place and grid cells.
The transformer model outperforms neuroscience-based models in representing hippocampal functions.
The work suggests wider cortical areas may perform complex tasks beyond current neuroscience models.
Abstract
Many deep neural network architectures loosely based on brain networks have recently been shown to replicate neural firing patterns observed in the brain. One of the most exciting and promising novel architectures, the Transformer neural network, was developed without the brain in mind. In this work, we show that transformers, when equipped with recurrent position encodings, replicate the precisely tuned spatial representations of the hippocampal formation; most notably place and grid cells. Furthermore, we show that this result is no surprise since it is closely related to current hippocampal models from neuroscience. We additionally show the transformer version offers dramatic performance gains over the neuroscience version. This work continues to bind computations of artificial and brain networks, offers a novel understanding of the hippocampal-cortical interaction, and suggests how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Neural dynamics and brain function
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax
