BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP

Tian Xia; Zihan Ma; Xinlong Wang; Qing Liu; Xiaowei He; Tianming Liu; Yudan Ren

arXiv:2510.19332·cs.CV·October 23, 2025

BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP

Tian Xia, Zihan Ma, Xinlong Wang, Qing Liu, Xiaowei He, Tianming Liu, Yudan Ren

PDF

Open Access

TL;DR

BrainMCLIP introduces a parameter-efficient multi-layer fusion method that aligns fMRI signals with CLIP's intermediate and final layers, capturing detailed visual information and surpassing VAE-based methods in image decoding performance.

Contribution

It pioneers a hierarchical, multi-layer fusion approach guided by the human visual system, eliminating the need for VAE pipelines in brain image decoding.

Findings

01

Achieves competitive or superior semantic decoding metrics.

02

Reduces parameters by 71.7% compared to VAE-based methods.

03

Effectively captures visual details missed by CLIP-only approaches.

Abstract

Decoding images from fMRI often involves mapping brain activity to CLIP's final semantic layer. To capture finer visual details, many approaches add a parameter-intensive VAE-based pipeline. However, these approaches overlook rich object information within CLIP's intermediate layers and contradicts the brain's functionally hierarchical. We introduce BrainMCLIP, which pioneers a parameter-efficient, multi-layer fusion approach guided by human visual system's functional hierarchy, eliminating the need for such a separate VAE pathway. BrainMCLIP aligns fMRI signals from functionally distinct visual areas (low-/high-level) to corresponding intermediate and final CLIP layers, respecting functional hierarchy. We further introduce a Cross-Reconstruction strategy and a novel multi-granularity loss. Results show BrainMCLIP achieves highly competitive performance, particularly excelling on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · EEG and Brain-Computer Interfaces · Generative Adversarial Networks and Image Synthesis