The Wisdom of a Crowd of Brains: A Universal Brain Encoder
Roman Beliy, Navve Wasserman, Amit Zalcher, Michal Irani

TL;DR
This paper introduces a universal brain encoder that leverages a voxel-centric architecture to jointly train on diverse datasets, enabling improved brain-encoding, transfer learning, and functional brain exploration.
Contribution
The paper presents a novel voxel-centric encoder architecture that allows joint training across multiple subjects and datasets, enhancing brain-encoding and interpretability.
Findings
Combines data from multiple subjects to improve encoding accuracy.
Enables effective transfer learning across different datasets and scanners.
Provides a tool for exploring brain functionality through voxel-embeddings.
Abstract
Image-to-fMRI encoding is important for both neuroscience research and practical applications. However, such "Brain-Encoders" have been typically trained per-subject and per fMRI-dataset, thus restricted to very limited training data. In this paper we propose a Universal Brain-Encoder, which can be trained jointly on data from many different subjects/datasets/machines. What makes this possible is our new voxel-centric Encoder architecture, which learns a unique "voxel-embedding" per brain-voxel. Our Encoder trains to predict the response of each brain-voxel on every image, by directly computing the cross-attention between the brain-voxel embedding and multi-level deep image features. This voxel-centric architecture allows the functional role of each brain-voxel to naturally emerge from the voxel-image cross-attention. We show the power of this approach to (i) combine data from multiple…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The proposed Universal Brain-Encoder can effectively handling sequences from different subjects, datasets, and machines, which enhances its applicability for both neuroscience research and practical applications 2. The paper presents comprehensive experimental results, and the proposed Universal Brain-Encoder achieves satisfied performance across multiple datasets. Notably, it achieves substantial performance improvements when trained on multi-dataset inputs, supporting the authors' argument
1. The idea appears to closely resemble existing works such as [1], MindFormer [2], MindEye2 [3], MindBridge [4], and BDI [5]. These studies also learn a set of independent parameters for each subject while sharing most parameters across subjects. The novelty of the proposed idea needs further clarification. 2. Some brain decoding methods employ symmetric architectures, so they have both Image-to-fMRI and fMRI-to-Image networks, such as [6] and [7]. A discussion about these approaches should be
- This is a very strong and well written paper. The methods are easy to understand and well motivated. I could see this encoder being used a lot when working with smaller vision datasets. - Retrieval accuracy is impressively high. It looks close to 95% top-1 accuracy for subjects 1 and 2 across 1000 test images (chance is 0.1%). - Statistical tests are performed for all experiments.
I think the paper is lacking some exploration and visualization of the voxel embeddings. Here are some ideas: - Apply the clustering to more than 2 participants. - Other clustering methods besides k-means (i.e. some that can deal with outliers) - A flatmap visualization with outlines of previously identified category selective regions for faces, bodies, places, and words. This would be helpful for comparing to the clusters identified with k-means. - A UMAP or tsne applied to the combined embedd
1. The problem of the universal brain-image generation is important. 2. Experiments successfully demonstrated the proposed method can train on multiple subjects from different datasets and achieved a better performance. 3. The presentation of motivations, methods, and experiments is clear and easy to follow.
1. The announcement of the *first-ever Universal Brain-Encoder* is too aggressive. The idea of the model is to be able to train on multiple subjects and datasets instead of **universally** applying to any unseen subjects or datasets. The performance of the proposed model on a new subject is tested via few-shot transfer learning instead of zero-shot learning. 2. The method of cross-attention is not novel and exists in the field of brain-image generation [1,2,3]. [1] Sun, Jingyuan, et al. "Cont
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces
