BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding
Yulong Liu, Yongqiang Ma, Wei Zhou, Guibo Zhu, Nanning Zheng

TL;DR
BrainCLIP introduces a novel brain decoding model that leverages CLIP's cross-modal capabilities to decode and reconstruct natural images from fMRI data, achieving state-of-the-art results in semantic fidelity.
Contribution
This work is the first to use CLIP as a bridge for generic brain decoding tasks, including zero-shot visual decoding and image reconstruction from fMRI signals.
Findings
Outperforms existing methods in zero-shot visual category decoding
Achieves high semantic fidelity in image reconstruction from fMRI
Establishes new state-of-the-art in fMRI-based natural image reconstruction
Abstract
Due to the lack of paired samples and the low signal-to-noise ratio of functional MRI (fMRI) signals, reconstructing perceived natural images or decoding their semantic contents from fMRI data are challenging tasks. In this work, we propose, for the first time, a task-agnostic fMRI-based brain decoding model, BrainCLIP, which leverages CLIP's cross-modal generalization ability to bridge the modality gap between brain activity, image, and text. Our experiments demonstrate that CLIP can act as a pivot for generic brain decoding tasks, including zero-shot visual categories decoding, fMRI-image/text matching, and fMRI-to-image generation. Specifically, BrainCLIP aims to train a mapping network that transforms fMRI patterns into a well-aligned CLIP embedding space by combining visual and textual supervision. Our experiments show that this combination can boost the decoding model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsContrastive Language-Image Pre-training
