Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers

Andrew F. Luo; Jacob Yeung; Rushikesh Zawar; Shaurya Dewan; Margaret M. Henderson; Leila Wehbe; Michael J. Tarr

arXiv:2410.05266·cs.CV·June 25, 2025

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers

Andrew F. Luo, Jacob Yeung, Rushikesh Zawar, Shaurya Dewan, Margaret M. Henderson, Leila Wehbe, Michael J. Tarr

PDF

Open Access 1 Repo 3 Reviews

TL;DR

BrainSAIL is a novel method that uses dense features from pre-trained vision transformers to map and analyze neural selectivity for semantic and visual features in the human cortex, without additional training.

Contribution

It introduces a new approach to link neural activity with dense, semantically consistent visual features using pre-trained models and a denoising process, enabling detailed brain mapping.

Findings

01

Accurately localizes category-specific brain regions.

02

Characterizes high-level scene and low-level visual feature selectivity.

03

Enables comparison of feature selectivity across brain regions.

Abstract

We introduce BrainSAIL, a method for linking neural selectivity with spatially distributed semantic visual concepts in natural scenes. BrainSAIL leverages recent advances in large-scale artificial neural networks, using them to provide insights into the functional topology of the brain. To overcome the challenge presented by the co-occurrence of multiple categories in natural images, BrainSAIL exploits semantically consistent, dense spatial features from pre-trained vision models, building upon their demonstrated ability to robustly predict neural activity. This method derives clean, spatially dense embeddings without requiring any additional training, and employs a novel denoising process that leverages the semantic consistency of images under random augmentations. By unifying the space of whole-image embeddings and dense visual features and then applying voxel-wise encoding models to…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

This is clearly written paper and makes a clear contribution. The idea of isolating specific visual features to determine selectivity effects in different cortical areas is novel and interesting. The proposed method can be used to explore the selectivity of higher visual cortex with respect to localized scene structure and image properties. This work achieves promising open vocabulary CLIP-based segmentation results.

Weaknesses

The current experimental comparisons (open-vocabulary segmentation) are limited, which restricts a comprehensive evaluation of the proposed method. The proposed approach appears to closely resemble BrainSCUBA. Certain technical details are unclear, which makes it challenging to fully understand or reproduce the method.

Reviewer 02Rating 6Confidence 3

Strengths

The paper is well written and clear. The main contribution is the use of model features to create voxel and image-wise spatial contribution maps. This is quite useful as an interpretation technique. The use of pixel-wise metrics like depth, saturation, and luminance are a powerful extension of the method.

Weaknesses

All of the investigated category-selective areas (food, places, words, faces, bodies) are previously reported. Would be interesting so see this tested on less common or more fine-grained categories. I found the explanation of the methods is a bit confusing (see questions). Could use clearer high-level descriptions before diving into the finer details. I think there should be a visual comparison of attribution maps for different voxel categories with the same images. I was able to compare a few

Reviewer 03Rating 8Confidence 4

Strengths

- The authors show alternatives to high-frequency artefact removal that are more computationally efficient than other solutions out there, namely by recognising the existing common method of adding register tokens in ViTs can be replaced with their proposed learning-free distillation module, which avoids the need for (computationally demanding) additional training. - The authors show that this method is a good way to target high confounds in naturalistic images by studying things like colour con

Weaknesses

- PDF is nearly 50 pages long. While ICLR allows for 10 main pages and unlimited supplementary information, the density of what is provided is a bit on the extreme side. In light of this, I still found myself searching for implementational information that I felt was missing and could be added in a revised version (see *Questions*) - I reformulated many of the comments I first wrote as weaknesses into more directed questions in the section below, hoping that these points might be more easily add

Code & Models

Repositories

aluo-x/brainsail
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · EEG and Brain-Computer Interfaces · Neural Networks and Applications