Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via   Multi-Task Training

Hassan Shahmohammadi; Hendrik P. A. Lensch; R. Harald Baayen

arXiv:2104.07500·cs.CL·September 15, 2021

Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via Multi-Task Training

Hassan Shahmohammadi, Hendrik P. A. Lensch, R. Harald Baayen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-task learning approach to create zero-shot, visually grounded word embeddings that effectively capture both abstract and concrete concepts, outperforming previous methods on various benchmarks.

Contribution

It proposes a novel implicit grounding method that integrates textual and visual modalities through reversible mappings, enhancing semantic representation without sacrificing linguistic abstraction.

Findings

01

Embeddings correlate strongly with human judgments.

02

Outperform previous models on multiple benchmarks.

03

Effective for both abstract and concrete words.

Abstract

Language grounding aims at linking the symbolic representation of language (e.g., words) into the rich perceptual knowledge of the outside world. The general approach is to embed both textual and visual information into a common space -the grounded space-confined by an explicit relationship between both modalities. We argue that this approach sacrifices the abstract knowledge obtained from linguistic co-occurrence statistics in the process of acquiring perceptual information. The focus of this paper is to solve this issue by implicitly grounding the word embeddings. Rather than learning two mappings into a joint space, our approach integrates modalities by determining a reversible grounded mapping between the textual and the grounded space by means of multi-task learning. Evaluations on intrinsic and extrinsic tasks show that our embeddings are highly beneficial for both abstract and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hazel1994/Visually_Grounded_Word_Embeddings
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques