Decoding fMRI Data into Captions using Prefix Language Modeling
Vyacheslav Shen, Kassymzhomart Kunanbayev, Dae-Shik Kim

TL;DR
This paper introduces a novel brain decoding method that predicts image embeddings from fMRI signals and uses prefix language modeling to generate captions, reducing computational load and improving accuracy.
Contribution
It proposes a new approach combining DINOv2 embeddings with GPT-2 for captioning from fMRI data, and explores 3D CNNs for better spatial mapping of brain signals.
Findings
Effective decoding of fMRI signals into image captions.
Reduced computational requirements compared to previous methods.
Improved mapping of voxel information using 3D CNNs.
Abstract
With the advancements in Large Language and Latent Diffusion models, brain decoding has achieved remarkable results in recent years. The works on the NSD dataset, with stimuli images from the COCO dataset, leverage the embeddings from the CLIP model for image reconstruction and GIT for captioning. However, the current captioning approach introduces the challenge of potential data contamination given that the GIT model was trained on the COCO dataset. In this work, we present an alternative method for decoding brain signals into image captions by predicting a DINOv2 model's embedding of an image from the corresponding fMRI signal and then providing its [CLS] token as the prefix to the GPT-2 language model which decreases computational requirements considerably. Additionally, instead of commonly used Linear Regression, we explore 3D Convolutional Neural Network mapping of fMRI signals to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsAttention Is All You Need · Dropout · Cosine Annealing · Linear Layer · Adam · Residual Connection · Weight Decay · Diffusion · Multi-Head Attention · Linear Regression
