Contrasting Cognitive Styles in Vision-Language Models: Holistic Attention in Japanese Versus Analytical Focus in English
Ahmed Sabir, Azinovi\v{c} Gasper, Mengsay Loem, and Rajesh Sharma

TL;DR
This study explores how vision-language models trained on Japanese and English data reflect culturally influenced visual processing styles, revealing that models internalize and reproduce cultural perceptual differences.
Contribution
It demonstrates that VLMs trained on different languages exhibit culturally grounded attentional patterns, linking language, culture, and visual cognition in AI models.
Findings
Japanese-trained models show holistic attention patterns.
English-trained models focus on individual objects.
Models reflect cultural perceptual differences embedded in training data.
Abstract
Cross-cultural research in perception and cognition has shown that individuals from different cultural backgrounds process visual information in distinct ways. East Asians, for example, tend to adopt a holistic perspective, attending to contextual relationships, whereas Westerners often employ an analytical approach, focusing on individual objects and their attributes. In this study, we investigate whether Vision-Language Models (VLMs) trained predominantly on different languages, specifically Japanese and English, exhibit similar culturally grounded attentional patterns. Using comparative analysis of image descriptions, we examine whether these models reflect differences in holistic versus analytic tendencies. Our findings suggest that VLMs not only internalize the structural properties of language but also reproduce cultural behaviors embedded in the training data, indicating that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCategorization, perception, and language · Cultural Differences and Values · Language, Metaphor, and Cognition
