Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs
Junyu Pan, Yansen Wang, Enze Zhang, Baoliang Lu, Weilong Zheng, Dongsheng Li

TL;DR
The paper introduces Generative Visual Grounding (GVG), a framework that visualizes EEG signals as images to improve understanding and interpretation of neural data in multimodal language models.
Contribution
It proposes a novel EEG-to-image generative approach that enhances neural signal interpretation by providing structured visual contexts, complementing traditional text-based alignment.
Findings
GVG-X-Omni matches text-aligned baselines with fewer parameters.
Visual proxies improve EEG understanding and visual generation.
Trimodal alignment yields consistent performance gains.
Abstract
Leveraging the universal representations of pre-trained LLMs and MLLMs offers a promising path toward brain foundation models. However, visually-evoked EEG datasets remain scarce, leading existing methods to align neural signals mainly with abstract text, a lossy translation that may discard fine-grained perceptual information encoded in brain activity. We propose Generative Visual Grounding (GVG), a framework that visualizes the invisible by using an EEG-to-image generative model as a visual translator. Instead of forcing EEG into text alone, GVG hallucinates instance-specific proxy images for non-visual EEG, providing structured visual contexts that allow MLLMs to exploit their visual priors for clinical-state interpretation. We validate this idea on two MLLM backbones, GVG-X-Omni and GVG-Janus. Image-only alignment is already competitive: the lightweight GVG-X-Omni matches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
