The time course of visuo-semantic representations in the human brain is captured by combining vision and language models
Boyan Rong, Alessandro Thomas Gifford, Emrah D\"uzel, Radoslaw Martin Cichy

TL;DR
This study combines vision deep neural networks and large language models to create an encoding model that more accurately predicts the time course of visuo-semantic brain responses during object viewing.
Contribution
It introduces a fusion approach of DNNs and LLMs to model visuo-semantic processing, outperforming previous models in EEG response prediction.
Findings
Fusion model outperforms individual models in EEG prediction
DNNs capture early broadband signals, LLMs capture later low-frequency signals
Combined model provides a more accurate representation of visuo-semantic processing
Abstract
The human visual system provides us with a rich and meaningful percept of the world, transforming retinal signals into visuo-semantic representations. For a model of these representations, here we leveraged a combination of two currently dominating approaches: vision deep neural networks (DNNs) and large language models (LLMs). Using large-scale human electroencephalography (EEG) data recorded during object image viewing, we built encoding models to predict EEG responses using representations from a vision DNN, an LLM, and their fusion. We show that the fusion encoding model outperforms encoding models based on either the vision DNN or the LLM alone, as well as previous modelling approaches, in predicting neural responses to visual stimulation. The vision DNN and the LLM complemented each other in explaining stimulus-related signal in the EEG responses. The vision DNN uniquely captured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Categorization, perception, and language · Image Retrieval and Classification Techniques
