Transformer Mechanisms Mimic Frontostriatal Gating Operations When   Trained on Human Working Memory Tasks

Aaron Traylor; Jack Merullo; Michael J. Frank; Ellie Pavlick

arXiv:2402.08211·cs.AI·February 14, 2024·1 cites

Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

Aaron Traylor, Jack Merullo, Michael J. Frank, Ellie Pavlick

PDF

Open Access

TL;DR

This study shows that vanilla Transformer models trained on working memory tasks develop mechanisms resembling human frontostriatal gating, bridging AI and neuroscience insights.

Contribution

It demonstrates that standard Transformer attention mechanisms can spontaneously mimic biological gating processes in working memory tasks.

Findings

01

Transformers develop gating-like mechanisms after training.

02

Mechanisms resemble biologically-inspired frontostriatal gating.

03

Opens avenues for AI-neuroscience comparative research.

Abstract

Models based on the Transformer neural network architecture have seen success on a wide variety of tasks that appear to require complex "cognitive branching" -- or the ability to maintain pursuit of one goal while accomplishing others. In cognitive neuroscience, success on such tasks is thought to rely on sophisticated frontostriatal mechanisms for selective \textit{gating}, which enable role-addressable updating -- and later readout -- of information to and from distinct "addresses" of memory, in the form of clusters of neurons. However, Transformer models have no such mechanisms intentionally built-in. It is thus an open question how Transformers solve such tasks, and whether the mechanisms that emerge to help them to do so bear any resemblance to the gating mechanisms in the human brain. In this work, we analyze the mechanisms that emerge within a vanilla attention-only Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMotor Control and Adaptation

MethodsPosition-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Dropout · Multi-Head Attention