BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP
Tian Xia, Zihan Ma, Xinlong Wang, Qing Liu, Xiaowei He, Tianming Liu, Yudan Ren

TL;DR
BrainMCLIP introduces a parameter-efficient multi-layer fusion method that aligns fMRI signals with CLIP's intermediate and final layers, capturing detailed visual information and surpassing VAE-based methods in image decoding performance.
Contribution
It pioneers a hierarchical, multi-layer fusion approach guided by the human visual system, eliminating the need for VAE pipelines in brain image decoding.
Findings
Achieves competitive or superior semantic decoding metrics.
Reduces parameters by 71.7% compared to VAE-based methods.
Effectively captures visual details missed by CLIP-only approaches.
Abstract
Decoding images from fMRI often involves mapping brain activity to CLIP's final semantic layer. To capture finer visual details, many approaches add a parameter-intensive VAE-based pipeline. However, these approaches overlook rich object information within CLIP's intermediate layers and contradicts the brain's functionally hierarchical. We introduce BrainMCLIP, which pioneers a parameter-efficient, multi-layer fusion approach guided by human visual system's functional hierarchy, eliminating the need for such a separate VAE pathway. BrainMCLIP aligns fMRI signals from functionally distinct visual areas (low-/high-level) to corresponding intermediate and final CLIP layers, respecting functional hierarchy. We further introduce a Cross-Reconstruction strategy and a novel multi-granularity loss. Results show BrainMCLIP achieves highly competitive performance, particularly excelling on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · EEG and Brain-Computer Interfaces · Generative Adversarial Networks and Image Synthesis
