MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction
Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Ke Liu, Liang Hu,, Duoqian Miao

TL;DR
MindTuner is a novel cross-subject visual decoding framework that leverages visual fingerprints and semantic correction to reconstruct high-quality images from fMRI data with minimal training data.
Contribution
It introduces a multi-subject pre-training and fine-tuning approach using visual fingerprints and a new fMRI-to-text alignment paradigm for improved cross-subject decoding.
Findings
Outperforms state-of-the-art models on the NSD dataset
Achieves high-quality reconstructions with only 1 hour of training data
Effective cross-subject decoding with minimal data
Abstract
Decoding natural visual scenes from brain activity has flourished, with extensive research in single-subject tasks and, however, less in cross-subject tasks. Reconstructing high-quality images in cross-subject tasks is a challenging problem due to profound individual differences between subjects and the scarcity of data annotation. In this work, we proposed MindTuner for cross-subject visual decoding, which achieves high-quality and rich semantic reconstructions using only 1 hour of fMRI training data benefiting from the phenomena of visual fingerprint in the human visual system and a novel fMRI-to-text alignment paradigm. Firstly, we pre-train a multi-subject model among 7 subjects and fine-tune it with scarce data on new subjects, where LoRAs with Skip-LoRAs are utilized to learn the visual fingerprint. Then, we take the image modality as the intermediate pivot modality to achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDigital Media Forensic Detection · Image Processing Techniques and Applications · Neural Networks and Applications
