TL;DR
This paper demonstrates that transformer-based attention mechanisms can effectively model and interpret high-level human visual responses, outperforming traditional static receptive field models in predicting brain activity during natural scene viewing.
Contribution
It introduces a novel application of transformer attention to dynamically route retinotopic visual features to category-specific brain areas, enhancing prediction accuracy and interpretability.
Findings
Transformer attention models outperform alternatives in predicting brain activity.
Attention signals can be visualized to interpret high-level visual processing.
Model generalizes well to novel images and different modalities.
Abstract
A major goal of neuroscience is to understand brain computations during visual processing in naturalistic settings. A dominant approach is to use image-computable deep neural networks trained with different task objectives as a basis for linear encoding models. However, in addition to requiring estimation of a large number of linear encoding parameters, this approach ignores the structure of the feature maps both in the brain and the models. Recently proposed alternatives factor the linear mapping into separate sets of spatial and feature weights, thus finding static receptive fields for units, which is appropriate only for early visual areas. In this work, we employ the attention mechanism used in the transformer architecture to study how retinotopic visual features can be dynamically routed to category-selective areas in high-level visual processing. We show that this computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsSoftmax · Attention Is All You Need · Focus
