Language with Vision: a Study on Grounded Word and Sentence Embeddings

Hassan Shahmohammadi; Maria Heitmeier; Elnaz Shafaei-Bajestan; Hendrik; P. A. Lensch; and Harald Baayen

arXiv:2206.08823·cs.CL·November 1, 2023

Language with Vision: a Study on Grounded Word and Sentence Embeddings

Hassan Shahmohammadi, Maria Heitmeier, Elnaz Shafaei-Bajestan, Hendrik, P. A. Lensch, and Harald Baayen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a computational model that effectively grounds pre-trained word embeddings in visual information, improving representations for both concrete and abstract words by aligning textual and visual data.

Contribution

It proposes a simple, effective alignment method for grounding word embeddings in vision, enhancing semantic representations for concrete and abstract words, including unseen ones.

Findings

01

Visual grounding benefits both concrete and abstract words.

02

The model improves embeddings for unseen words through alignment.

03

Advantages are observed for contextualized embeddings like BERT.

Abstract

Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hazel1994/visually_grounded_word_embeddings_2
tfOfficial

Datasets

fittar/visually_grounded_embeddings
dataset· 57 dl
57 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Natural Language Processing Techniques · Multimodal Machine Learning Applications