Revealing Vision-Language Integration in the Brain with Multimodal Networks
Vighnesh Subramaniam, Colin Conwell, Christopher Wang, Gabriel, Kreiman, Boris Katz, Ignacio Cases, Andrei Barbu

TL;DR
This study uses multimodal deep neural networks to identify brain regions involved in vision-language integration during movie viewing, revealing specific neural sites and the effectiveness of CLIP-style training.
Contribution
It introduces a method to detect brain regions of multimodal integration using DNN predictions of SEEG signals, comparing different models and training techniques.
Findings
Identified approximately 13% of neural sites as multimodal integration sites.
Demonstrated that trained models outperform random models in predicting neural signals.
Found CLIP-style training most effective for neural activity prediction.
Abstract
We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoencephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e.g., convolutional networks and transformers) and multimodal training techniques (e.g., cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
