Seeing the advantage: visually grounding word embeddings to better   capture human semantic knowledge

Danny Merkx; Stefan L. Frank; Mirjam Ernestus

arXiv:2202.10292·cs.CL·March 31, 2022

Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge

Danny Merkx, Stefan L. Frank, Mirjam Ernestus

PDF

Open Access 1 Repo

TL;DR

This paper introduces visually grounded word embeddings that combine text and images, demonstrating they better reflect human semantic understanding and cognitive aspects than traditional text-only models.

Contribution

The paper presents a novel method for creating visually grounded word embeddings and shows they outperform text-only models in predicting human reaction times and similarity judgments.

Findings

01

Visually grounded embeddings better predict human reaction times.

02

They correlate more strongly with human word similarity ratings.

03

Grounded embeddings explain unique variance beyond text-based models.

Abstract

Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings. Importantly, in both experiments we show that the grounded embeddings account for a unique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DannyMerkx/speech2image
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques