Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, Michal Irani

TL;DR
Brain-IT introduces a brain-inspired transformer model that improves image reconstruction from fMRI data by effectively capturing brain-voxel interactions and integrating semantic and structural image features, outperforming existing methods.
Contribution
The paper proposes Brain-IT, a novel transformer-based approach that models functional brain-voxel interactions and enhances image reconstruction fidelity from limited fMRI data.
Findings
Achieves more faithful image reconstructions than state-of-the-art methods.
Requires significantly less fMRI data to produce high-quality images.
Outperforms existing approaches both visually and on standard metrics.
Abstract
Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present "Brain-IT", a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with a limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i)high-level semantic features which steer the diffusion model toward the…
Peer Reviews
Decision·ICLR 2026 Poster
1. **SoTA empirical results:** Outperforms all previous methods in full-data and limited-data settings. 2. **Novel techniques:** The idea of functional voxel clustering for fMRI-Image decoding is novel and improves performance, finding an elegant way to incorporate transformers into the mapper architecture. 3. **Extensive quantitative and qualitative evaluation:** The paper provides clear tables, broad baselines (MindEye, BrainDiffuser, MindTuner, MindEye2, NeuroVLA, etc.), and an ablation appe
1. **Ambiguity in Functional Clustering Implementation:** While the voxel-to-cluster mapping is conceptually clear, I would appreciate a clarification on a few details: - Are the voxel embeddings unique per-voxel-per-subject (i.e., one embedding vector per voxel that is fixed regardless of the input image), or do they vary per-image? - Are these voxel embeddings frozen after the initial V2C mapping is established, or are they continuously optimized during BIT training? Section 5.3 mentions that
- Brain‑IT outperforms prior methods on most of the conventional metrics in the 40‑hour setting and 1‑hour setting as well. It also reports first results for 15/30‑minute reconstruction on single subject. - Ablations cover usage of external unlabeled images, functional vs. anatomical clustering, and number of clusters, plus branch‑wise contributions. - They pledge to release code, checkpoints, and all reconstructed images upon publication.
- There is no explicit OOD evaluation (e.g., NSD‑synthetic) to probe robustness beyond the training distribution. - The paper primarily focuses on reconstruction accuracy and does not analyze what the Brain Tokens capture, thus provides limited neuroscientific insights.
* The brain-inspired BIT design is both biologically plausible and effective, leveraging shared functional voxel clusters to address data scarcity and inter-subject variability. * The dual-branch reconstruction (structural and semantic) balances visual fidelity and semantic accuracy, producing reconstructions that better match the original stimuli. * The efficient transfer learning strategy enables fast adaptation to new subjects with minimal fMRI data, greatly improving practicality.
* The paper lacks a clear distinction between its scientific and technical goals. While it emphasizes "brain interaction", it does not analyze what kinds of neural interactions BIT is meant to model or how these relate to known brain mechanisms, making the neuroscience contribution unclear. * The Voxel-to-Cluster (V2C) mapping largely follows Beliy et al. (2024) but does not clearly explain what is new. How the shared clusters differ from prior work and how they are optimized for decoding rather
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
