Brain-Like Language Processing via a Shallow Untrained Multihead   Attention Network

Badr AlKhamissi; Greta Tuckute; Antoine Bosselut; Martin Schrimpf

arXiv:2406.15109·cs.CL·June 24, 2024·2 cites

Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

PDF

Open Access 1 Repo

TL;DR

This paper investigates how untrained shallow transformer models with specific architectural features can align with human brain activity during language processing, revealing key components that drive this alignment and improving language modeling efficiency.

Contribution

It identifies tokenization and multihead attention as critical for brain alignment in untrained models and demonstrates that simple recurrence enhances this alignment, offering insights into brain-like language processing.

Findings

01

Untrained models with certain architectures align with brain activity.

02

Tokenization and multihead attention are key components for alignment.

03

The model improves language modeling efficiency and predicts human reading times.

Abstract

Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bkhmsi/brain-language-suma
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces · Topic Modeling

MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings