CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information
Kaifan Zhang, Lihuo He, Xin Jiang, Wen Lu, Di Wang, Xinbo Gao

TL;DR
CognitionCapturer is a novel framework that decodes visual stimuli from EEG signals by leveraging multimodal data and a diffusion prior, achieving high-fidelity reconstructions without fine-tuning generative models.
Contribution
It introduces a unified multimodal approach with modality-specific encoders and a diffusion prior to improve visual stimulus reconstruction from EEG signals.
Findings
Outperforms state-of-the-art methods quantitatively.
Produces high semantic and structural fidelity in reconstructed images.
Extensible to incorporate additional modalities.
Abstract
Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable ``beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address this limitation, we propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Expert Encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces
MethodsSoftmax · Attention Is All You Need · Diffusion · Contrastive Language-Image Pre-training
