NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning
Jiyuan Wang, Li Zhang, Haipeng Lin, Qile Liu, Gan Huang, Ziyu Li, Zhen Liang, and Xia Wu

TL;DR
NeuroCLIP introduces a brain-inspired prompt tuning framework for EEG-to-image contrastive learning, enhancing neural-visual alignment through adaptive prompts and a neuroscientific-inspired loss, achieving significant improvements in zero-shot image retrieval.
Contribution
The paper presents the first integration of visual prompt tokens into EEG-image alignment and a dual-stream adaptive prompt design inspired by neuroscience principles.
Findings
Achieved 63.2% Top-1 accuracy in zero-shot image retrieval on THINGS-EEG2 dataset.
Surpassed previous methods by +12.3% in accuracy.
Demonstrated strong generalization across subjects (+4.6% Top-1).
Abstract
Recent advances in brain-inspired artificial intelligence have sought to align neural signals with visual semantics using multimodal models such as CLIP. However, existing methods often treat CLIP as a static feature extractor, overlooking its adaptability to neural representations and the inherent physiological-symbolic gap in EEG-image alignment. To address these challenges, we present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning. Our approach introduces three core innovations: (1) We design a dual-stream visual embedding pipeline that combines dynamic filtering and token-level fusion to generate instance-level adaptive prompts, which guide the adjustment of patch embedding tokens based on image content, thereby enabling fine-grained modulation of visual representations under neural constraints; (2) We are the first to introduce visual prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
