How direct is the link between words and images?
Hassan Shahmohammadi, Maria Heitmeier, Elnaz Shafaei-Bajestan, Hendrik, P. A. Lensch, Harald Baayen

TL;DR
This study critically examines whether current word-image grounding experiments truly measure perceptual experience, finding that textual similarities largely explain participant choices and questioning the effectiveness of such experiments in assessing visual grounding.
Contribution
The paper introduces novel experiments comparing textual and visually grounded embeddings, revealing limited advantages of visual grounding in explaining human word-image associations.
Findings
Text-based embeddings largely explain participant choices.
Visually grounded embeddings offer only modest improvements.
The original experiment may not effectively measure perceptual experience.
Abstract
Current word embedding models despite their success, still suffer from their lack of grounding in the real world. In this line of research, Gunther et al. 2022 proposed a behavioral experiment to investigate the relationship between words and images. In their setup, participants were presented with a target noun and a pair of images, one chosen by their model and another chosen randomly. Participants were asked to select the image that best matched the target noun. In most cases, participants preferred the image selected by the model. Gunther et al., therefore, concluded the possibility of a direct link between words and embodied experience. We took their experiment as a point of departure and addressed the following questions. 1. Apart from utilizing visually embodied simulation of given images, what other strategies might subjects have used to solve this task? To what extent does this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAction Observation and Synchronization · Language, Metaphor, and Cognition
