ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection

Ganxi Xu; Zhao-Rong Lai; Yuting Tang; Yonghao Song; Shuyan Zhou; Guoxu Zhou; Boyu Wang; Jian Zhu; Jinyi Long

arXiv:2604.26218·cs.CV·April 30, 2026

ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection

Ganxi Xu, Zhao-Rong Lai, Yuting Tang, Yonghao Song, Shuyan Zhou, Guoxu Zhou, Boyu Wang, Jian Zhu, Jinyi Long

PDF

TL;DR

ViBE is a novel framework that encodes visual stimuli into M/EEG signals using a spatio-temporal VAE and distribution alignment, advancing neural response reconstruction and cross-modal alignment.

Contribution

The paper introduces ViBE, combining a TSC-VAE and Q-Former for effective neural response generation and cross-modal alignment from visual stimuli.

Findings

01

Effective neural response reconstruction demonstrated on THINGS-EEG2 and THINGS-MEG datasets.

02

Improved cross-modal alignment between visual features and neural responses.

03

High-quality M/EEG signal generation from visual stimuli.

Abstract

Brain encoding models not only serve to decipher how visual stimuli are transformed into neural responses, but also represent a critical step toward visual prostheses that restore vision for patients with severe vision disorders. Brain encoding involves two fundamental steps: achieving faithful reconstruction of neural responses and establishing cross-modal alignment between visual stimuli and neural responses. To this end, we propose ViBE, a novel brain encoding framework for generating magnetoencephalography (MEG) and electroencephalography (EEG) signals from visual stimuli. Specifically, we first design a spatio-temporal convolutional variational autoencoder (TSC-VAE) that captures the spatio-temporal characteristics of M/EEG signals for effective neural response reconstruction. To bridge the modality gap between visual features and neural representations, we employ Q-Former to map…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.