Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex
Idan Daniel Grosbard, Mor Geva, and Galit Yovel

TL;DR
This paper introduces MINE, a framework that interprets neural encoding models to identify and validate the specific visual features driving activity in human visual cortex at a fine-grained level.
Contribution
MINE applies mechanistic interpretability tools to localize and describe image features that drive voxel responses, enabling causal validation and revealing detailed functional selectivity.
Findings
MINE accurately predicts voxel responses using interpretable features.
Counterfactual manipulations confirm the causal role of identified features.
Reveals fine-grained voxel structure within category-selective regions.
Abstract
A central goal in understanding human vision is to uncover the visual features that drive neuronal activity. A growing body of work has used artificial neural networks as encoding models to predict cortical responses to natural images, revealing the visual content that activates category-selective regions. However, existing approaches are largely correlational and treat the encoder as a black box, leaving open which image features drive each voxel's response. We introduce Mechanistically Interpretable Neural Encoding (MINE), a framework that opens this black box by applying mechanistic-interpretability tools to localize the features within natural images that drive millimeter-scale (voxel-level) activity. MINE predicts each voxel's response using language-aligned image representations, and produces semantically interpretable descriptions of the features critical for the voxel's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
