Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper, Hadar Averbuch-Elor

TL;DR
This paper investigates whether vision-and-language models like CLIP and Stable Diffusion exhibit sound symbolism, revealing that these models do reflect the kiki-bouba effect through zero-shot probing, thus paralleling human psycholinguistic phenomena.
Contribution
It introduces a novel computational method to detect sound symbolism in vision-and-language models, demonstrating their inherent knowledge of cross-modal associations.
Findings
Models show the kiki-bouba effect in zero-shot probing.
Sound symbolism is reflected in vision-and-language models.
The method provides a new way to study cross-modal associations.
Abstract
Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Language and cultural evolution · Categorization, perception, and language
MethodsDiffusion · Contrastive Language-Image Pre-training
