MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery
Reese Kneeland, Cesar Kadir Torrico Villanueva, Jordyn Ojeda, Shuhb Khanna, Jonathan Xu, Paul S. Scotti, Thomas Naselaris

TL;DR
MIRAGE is a novel multi-modal architecture that improves the reconstruction of mental images from brain activity, demonstrating state-of-the-art performance on the NSD-Imagery benchmark.
Contribution
The paper introduces MIRAGE, a new method that trains on vision datasets to effectively decode and reconstruct mental images from brain activity.
Findings
MIRAGE achieves state-of-the-art performance on the NSD-Imagery benchmark.
Using low-dimensional image features and multi-modal guidance enhances reconstruction quality.
Large-scale external stimulus datasets can be effectively used for mental image decoding.
Abstract
To be useful for downstream applications, vision decoding models that are trained to reconstruct seen images from human brain activity must be able to generalize to internally generated visual representations, i.e., mental images. In an analysis of the recently released NSD-Imagery dataset, we demonstrated that while some modern vision decoders can perform quite well on mental image reconstruction, some fail, and that state-of-the-art (SOTA) performance on seen image reconstruction is no guarantee of SOTA performance on mental image reconstruction. Motivated by these findings, we developed MIRAGE, a method explicitly designed to train on vision datasets and cross-decode mental images from brain activity. MIRAGE employs a linear backbone and multi-modal text and image features as input to a diffusion model. Feature metrics and human raters establish MIRAGE as SOTA for mental image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
