Aligned with LLM: a new multi-modal training paradigm for encoding fMRI   activity in visual cortex

Shuxiao Ma; Linyuan Wang; Senbao Hou; Bin Yan

arXiv:2401.03851·cs.CV·January 9, 2024·1 cites

Aligned with LLM: a new multi-modal training paradigm for encoding fMRI activity in visual cortex

Shuxiao Ma, Linyuan Wang, Senbao Hou, Bin Yan

PDF

Open Access

TL;DR

This paper introduces a novel multi-modal training paradigm aligned with large language models to improve the encoding of fMRI activity in the visual cortex, leveraging text-image alignment for enhanced brain activity prediction.

Contribution

The paper proposes a new multi-modal training approach aligned with LLMs, specifically using text-image alignment to improve visual cortex encoding models.

Findings

01

Enhanced performance of the visual encoding model with the new paradigm

02

Effective use of LLM-generated descriptions and contrast loss for alignment

03

Significant improvement in fMRI activity prediction accuracy

Abstract

Recently, there has been a surge in the popularity of pre trained large language models (LLMs) (such as GPT-4), sweeping across the entire Natural Language Processing (NLP) and Computer Vision (CV) communities. These LLMs have demonstrated advanced multi-modal understanding capabilities and showcased strong performance across various benchmarks. The LLM has started to embody traits of artificial general intelligence, which holds vital guidance for enhancing brain-like characteristics within visual encoding models. Hence, This paper proposes a new multi-modal training paradigm, aligning with LLM, for encoding fMRI activity in visual cortex. Based on this paradigm, we trained an encoding model in fMRI data named the LLM-Visual Encoding Model (LLM-VEM). Specifically, we utilize LLM (miniGPT4) to generate descriptive text for all stimulus images, forming a high-quality textual description…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning