The time course of visuo-semantic representations in the human brain is captured by combining vision and language models

Boyan Rong; Alessandro Thomas Gifford; Emrah D\"uzel; Radoslaw Martin Cichy

arXiv:2506.19497·q-bio.NC·June 25, 2025

The time course of visuo-semantic representations in the human brain is captured by combining vision and language models

Boyan Rong, Alessandro Thomas Gifford, Emrah D\"uzel, Radoslaw Martin Cichy

PDF

Open Access

TL;DR

This study combines vision deep neural networks and large language models to create an encoding model that more accurately predicts the time course of visuo-semantic brain responses during object viewing.

Contribution

It introduces a fusion approach of DNNs and LLMs to model visuo-semantic processing, outperforming previous models in EEG response prediction.

Findings

01

Fusion model outperforms individual models in EEG prediction

02

DNNs capture early broadband signals, LLMs capture later low-frequency signals

03

Combined model provides a more accurate representation of visuo-semantic processing

Abstract

The human visual system provides us with a rich and meaningful percept of the world, transforming retinal signals into visuo-semantic representations. For a model of these representations, here we leveraged a combination of two currently dominating approaches: vision deep neural networks (DNNs) and large language models (LLMs). Using large-scale human electroencephalography (EEG) data recorded during object image viewing, we built encoding models to predict EEG responses using representations from a vision DNN, an LLM, and their fusion. We show that the fusion encoding model outperforms encoding models based on either the vision DNN or the LLM alone, as well as previous modelling approaches, in predicting neural responses to visual stimulation. The vision DNN and the LLM complemented each other in explaining stimulus-related signal in the EEG responses. The vision DNN uniquely captured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Categorization, perception, and language · Image Retrieval and Classification Techniques