Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion
Xuanyu Hu

TL;DR
This paper introduces a novel multimodal brain decoding model that enhances cross-subject generalization and interpretability, achieving state-of-the-art results in brain-captioning tasks using fMRI data.
Contribution
The paper proposes a new BrainROI model with a shared soft-ROI space, voxel-gate fusion, and an interpretable prompt optimization process for improved brain decoding.
Findings
Achieved leading results on NSD dataset with improved BLEU-4 and CIDEr scores.
Designed a voxel-wise gated fusion mechanism for better cross-subject transferability.
Implemented an interpretable prompt optimization process that enhances stability and transparency.
Abstract
Multimodal brain decoding aims to reconstruct semantic information that is consistent with visual stimuli from brain activity signals such as fMRI, and then generate readable natural language descriptions. However, multimodal brain decoding still faces key challenges in cross-subject generalization and interpretability. We propose a BrainROI model and achieve leading-level results in brain-captioning evaluation on the NSD dataset. Under the cross-subject setting, compared with recent state-of-the-art methods and representative baselines, metrics such as BLEU-4 and CIDEr show clear improvements. Firstly, to address the heterogeneity of functional brain topology across subjects, we design a new fMRI encoder. We use multi-atlas soft functional parcellations (soft-ROI) as a shared space. We extend the discrete ROI Concatenation strategy in MINDLLM to a voxel-wise gated fusion mechanism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
