Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge
Danny Merkx, Stefan L. Frank, Mirjam Ernestus

TL;DR
This paper introduces visually grounded word embeddings that combine text and images, demonstrating they better reflect human semantic understanding and cognitive aspects than traditional text-only models.
Contribution
The paper presents a novel method for creating visually grounded word embeddings and shows they outperform text-only models in predicting human reaction times and similarity judgments.
Findings
Visually grounded embeddings better predict human reaction times.
They correlate more strongly with human word similarity ratings.
Grounded embeddings explain unique variance beyond text-based models.
Abstract
Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings. Importantly, in both experiments we show that the grounded embeddings account for a unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
