Cross-Lingual and Cross-Cultural Variation in Image Descriptions
Uri Berger, Edoardo M. Ponti

TL;DR
This study conducts a large-scale empirical analysis of how different languages and cultures influence the way people describe images, revealing both universal and culture-specific patterns in entity mentions across 31 languages.
Contribution
It introduces a novel method to measure cross-lingual variation in image descriptions using a diverse, multimodal dataset, and provides large-scale evidence of cultural effects on perception.
Findings
Languages geographically or genetically closer mention similar entities more often.
Certain entity categories show universal saliency, others vary across languages.
The study supports theories of basic-level categories and environment-influenced perception patterns.
Abstract
Do speakers of different languages talk differently about what they see? Behavioural and cognitive studies report cultural effects on perception; however, these are mostly limited in scope and hard to replicate. In this work, we conduct the first large-scale empirical study of cross-lingual variation in image descriptions. Using a multimodal dataset with 31 languages and images from diverse locations, we develop a method to accurately identify entities mentioned in captions and present in the images, then measure how they vary across languages. Our analysis reveals that pairs of languages that are geographically or genetically closer tend to mention the same entities more frequently. We also identify entity categories whose saliency is universally high (such as animate beings), low (clothing accessories) or displaying high variance across languages (landscape). In a case study, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Translation Studies and Practices
