Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging

Runze Xia; Shuo Feng; Renzhi Wang; Congchi Yin; Xuyun Wen; Piji Li

arXiv:2505.22150·cs.CV·May 30, 2025

Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging

Runze Xia, Shuo Feng, Renzhi Wang, Congchi Yin, Xuyun Wen, Piji Li

PDF

Open Access

TL;DR

This paper introduces FgB2I, a novel method that uses fine-grained text descriptions as a bridge to enhance the accuracy and detail of brain-to-image reconstruction from fMRI data.

Contribution

It proposes a three-stage approach incorporating fine-grained text generation and semantic guidance to improve visual stimulus reconstruction from brain activity.

Findings

01

Enhanced image reconstruction with more details and semantic consistency.

02

Validated the importance of fine-grained captions generated from vision-language models.

03

Guided decoding using reward metrics improves the semantic alignment of reconstructed images.

Abstract

Brain-to-Image reconstruction aims to recover visual stimuli perceived by humans from brain activity. However, the reconstructed visual stimuli often missing details and semantic inconsistencies, which may be attributed to insufficient semantic information. To address this issue, we propose an approach named Fine-grained Brain-to-Image reconstruction (FgB2I), which employs fine-grained text as bridge to improve image reconstruction. FgB2I comprises three key stages: detail enhancement, decoding fine-grained text descriptions, and text-bridged brain-to-image reconstruction. In the detail-enhancement stage, we leverage large vision-language models to generate fine-grained captions for visual stimuli and experimentally validate its importance. We propose three reward metrics (object accuracy, text-image semantic similarity, and image-image semantic similarity) to guide the language model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Face Recognition and Perception