Language modulates vision: Evidence from neural networks and human brain-lesion models
Haoyang Chen, Bo Liu, Shuyue Wang, Xiaosha Wang, Wenjuan Han, Yixin Zhu, Xiaochun Wang, Yanchao Bi

TL;DR
This study demonstrates that language influences visual perception in humans and neural networks, showing that disrupting language pathways affects the alignment between models and brain activity, thus highlighting the importance of language in visual processing.
Contribution
It combines neural network analysis with human brain lesion data to causally demonstrate language's role in visual perception, advancing neurocognitive modeling.
Findings
CLIP aligns better with VOTC activity than other models.
Left-lateralized language regions influence visual representations.
Brain lesions in language pathways reduce model-brain correspondence.
Abstract
Comparing information structures in between deep neural networks (DNNs) and the human brain has become a key method for exploring their similarities and differences. Recent research has shown better alignment of vision-language DNN models, such as CLIP, with the activity of the human ventral occipitotemporal cortex (VOTC) than earlier vision models, supporting the idea that language modulates human visual perception. However, interpreting the results from such comparisons is inherently limited due to the "black box" nature of DNNs. To address this, we combined model-brain fitness analyses with human brain lesion data to examine how disrupting the communication pathway between the visual and language systems causally affects the ability of vision-language DNNs to explain the activity of the VOTC. Across four diverse datasets, CLIP consistently captured unique variance in VOTC neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCategorization, perception, and language · Language, Metaphor, and Cognition
MethodsInfoNCE · Batch Normalization · Momentum Contrast · Contrastive Language-Image Pre-training
