Topoformer: brain-like topographic organization in Transformer language models through spatial querying and reweighting

Taha Binhuraib; Greta Tuckute; Nicholas Blauch

arXiv:2510.18745·cs.CL·October 22, 2025

Topoformer: brain-like topographic organization in Transformer language models through spatial querying and reweighting

Taha Binhuraib, Greta Tuckute, Nicholas Blauch

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Topoformer, a novel Transformer variant with topographic organization inspired by biological brains, achieved through spatial querying and reweighting, leading to more interpretable models and alignment with human brain data.

Contribution

The paper proposes a new self-attention mechanism that induces topographic organization in Transformers, enhancing interpretability and biological plausibility.

Findings

01

Topoformer achieves topographic organization in NLP models.

02

Topoformer performs comparably to standard models on NLP benchmarks.

03

Topographic features in Topoformer align with human brain language responses.

Abstract

Spatial functional organization is a hallmark of biological brains: neurons are arranged topographically according to their response properties, at multiple scales. In contrast, representations within most machine learning models lack spatial biases, instead manifesting as disorganized vector spaces that are difficult to visualize and interpret. Here, we propose a novel form of self-attention that turns Transformers into "Topoformers" with topographic organization. We introduce spatial querying - where keys and queries are arranged on 2D grids, and local pools of queries are associated with a given key - and spatial reweighting, where we convert the standard fully connected layer of self-attention into a locally connected layer. We first demonstrate the feasibility of our approach by training a 1-layer Topoformer on a sentiment classification task. Training with spatial querying…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. Addressed the architectural disparity between current Transformer models and the biological brain by inducing a topographic organization of features within the Transformer. 2. The topographic organization of the Topoformer yields competitive performance compared to the Vanilla Transformer model in small-scale sentiment analysis and benchmark GLUE tasks. 3. The proposed method is scalable for both small and larger-scale datasets. 4. The alignment between the way information is organized in the

Weaknesses

1. Although the novelty of the paper is interesting, but it lacks specific experimental details. o How to choose the optimal number of tokens in local spatial querying? o Additionally, with the introduction of local pooling of spatial queries, the parameter differences between Topoformer and Vanilla BERT is not provided. o How did the authors generate Fig 4? What does "Stat Value" refer to? How do we determine the Stat Value across layers for queries, keys, values, and fc_out? o What does fc_ou

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

This paper introduces the idea of spatial structure in the attention mechanism of a transformer, devises a method to achieve it, and shows that the spatial structure indeed appears.

Weaknesses

This paper is unfortunately quite badly written, but thankfully many aspects can be improved straightforwardly. 1. The abstract and intro contain falsehoods and non sequiturs and would largely benefit from being made more concise. Example of falsehood: Abstract, sentence 2. Convnets exhibit and exploit spatial structure Example of non sequitur: Intro, paragraph 3. "Despite the success of these LMs, the fact that their architecture is not compatible with spatial constraints of the biological cor

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The proposed method, Topoformer, is a novel approach to training Transformer language models with topographic organization. Topoformer has been shown to be feasible on a 1-layer sentiment classification task and to perform on par with a non-topographic control architecture on downstream NLP benchmarks. Topoformer has also been shown to yield similar forms of topographic organization for linguistic information as that present in the language network of individual subjects.

Weaknesses

The paper does not provide any concrete examples of how Topoformers can be used to improve the interpretability of NLP models. The paper does not evaluate Topoformer on a variety of different NLP tasks.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Multimodal Machine Learning Applications · Action Observation and Synchronization