ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection
Ganxi Xu, Zhao-Rong Lai, Yuting Tang, Yonghao Song, Shuyan Zhou, Guoxu Zhou, Boyu Wang, Jian Zhu, Jinyi Long

TL;DR
ViBE is a novel framework that encodes visual stimuli into M/EEG signals using a spatio-temporal VAE and distribution alignment, advancing neural response reconstruction and cross-modal alignment.
Contribution
The paper introduces ViBE, combining a TSC-VAE and Q-Former for effective neural response generation and cross-modal alignment from visual stimuli.
Findings
Effective neural response reconstruction demonstrated on THINGS-EEG2 and THINGS-MEG datasets.
Improved cross-modal alignment between visual features and neural responses.
High-quality M/EEG signal generation from visual stimuli.
Abstract
Brain encoding models not only serve to decipher how visual stimuli are transformed into neural responses, but also represent a critical step toward visual prostheses that restore vision for patients with severe vision disorders. Brain encoding involves two fundamental steps: achieving faithful reconstruction of neural responses and establishing cross-modal alignment between visual stimuli and neural responses. To this end, we propose ViBE, a novel brain encoding framework for generating magnetoencephalography (MEG) and electroencephalography (EEG) signals from visual stimuli. Specifically, we first design a spatio-temporal convolutional variational autoencoder (TSC-VAE) that captures the spatio-temporal characteristics of M/EEG signals for effective neural response reconstruction. To bridge the modality gap between visual features and neural representations, we employ Q-Former to map…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
