Visual representations in the human brain are aligned with large   language models

Adrien Doerig; Tim C Kietzmann; Emily Allen; Yihan Wu; Thomas; Naselaris; Kendrick Kay; Ian Charest

arXiv:2209.11737·cs.CV·July 9, 2024·37 cites

Visual representations in the human brain are aligned with large language models

Adrien Doerig, Tim C Kietzmann, Emily Allen, Yihan Wu, Thomas, Naselaris, Kendrick Kay, Ian Charest

PDF

Open Access

TL;DR

This study demonstrates that large language model embeddings of scene captions effectively model and predict human brain activity during visual scene perception, revealing alignment between language-based and neural representations.

Contribution

The paper introduces a novel approach linking LLM embeddings with brain activity, showing they capture complex visual information and outperform other models despite less training data.

Findings

01

LLM embeddings characterize brain activity during scene viewing.

02

Scene caption embeddings can be reconstructed from brain activity.

03

Deep neural networks trained on images produce representations aligned with brain data.

Abstract

The human brain extracts complex information from visual inputs, including objects, their spatial and semantic interrelations, and their interactions with the environment. However, a quantitative approach for studying this information remains elusive. Here, we test whether the contextual information encoded in large language models (LLMs) is beneficial for modelling the complex visual information extracted by the brain from natural scenes. We show that LLM embeddings of scene captions successfully characterise brain activity evoked by viewing the natural scenes. This mapping captures selectivities of different brain areas, and is sufficiently robust that accurate scene captions can be reconstructed from brain activity. Using carefully controlled model comparisons, we then proceed to show that the accuracy with which LLM representations match brain representations derives from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Image Retrieval and Classification Techniques