Language with Vision: a Study on Grounded Word and Sentence Embeddings
Hassan Shahmohammadi, Maria Heitmeier, Elnaz Shafaei-Bajestan, Hendrik, P. A. Lensch, and Harald Baayen

TL;DR
This paper introduces a computational model that effectively grounds pre-trained word embeddings in visual information, improving representations for both concrete and abstract words by aligning textual and visual data.
Contribution
It proposes a simple, effective alignment method for grounding word embeddings in vision, enhancing semantic representations for concrete and abstract words, including unseen ones.
Findings
Visual grounding benefits both concrete and abstract words.
The model improves embeddings for unseen words through alignment.
Advantages are observed for contextualized embeddings like BERT.
Abstract
Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Natural Language Processing Techniques · Multimodal Machine Learning Applications
