Visual representations in the human brain are aligned with large language models
Adrien Doerig, Tim C Kietzmann, Emily Allen, Yihan Wu, Thomas, Naselaris, Kendrick Kay, Ian Charest

TL;DR
This study demonstrates that large language model embeddings of scene captions effectively model and predict human brain activity during visual scene perception, revealing alignment between language-based and neural representations.
Contribution
The paper introduces a novel approach linking LLM embeddings with brain activity, showing they capture complex visual information and outperform other models despite less training data.
Findings
LLM embeddings characterize brain activity during scene viewing.
Scene caption embeddings can be reconstructed from brain activity.
Deep neural networks trained on images produce representations aligned with brain data.
Abstract
The human brain extracts complex information from visual inputs, including objects, their spatial and semantic interrelations, and their interactions with the environment. However, a quantitative approach for studying this information remains elusive. Here, we test whether the contextual information encoded in large language models (LLMs) is beneficial for modelling the complex visual information extracted by the brain from natural scenes. We show that LLM embeddings of scene captions successfully characterise brain activity evoked by viewing the natural scenes. This mapping captures selectivities of different brain areas, and is sufficiently robust that accurate scene captions can be reconstructed from brain activity. Using carefully controlled model comparisons, we then proceed to show that the accuracy with which LLM representations match brain representations derives from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Image Retrieval and Classification Techniques
