Probing Multimodal Embeddings for Linguistic Properties: the   Visual-Semantic Case

Adam Dahlgren Lindstr\"om; Suna Bensch; Johanna Bj\"orklund; Frank; Drewes

arXiv:2102.11115·cs.LG·February 23, 2021

Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

Adam Dahlgren Lindstr\"om, Suna Bensch, Johanna Bj\"orklund, Frank, Drewes

PDF

1 Repo

TL;DR

This paper introduces a framework for probing visual-semantic embeddings to understand their linguistic properties, revealing that multimodal embeddings better capture combined image and text information than unimodal ones.

Contribution

It formalizes probing tasks for image-caption embeddings, enabling analysis of their linguistic properties and comparison of different models.

Findings

01

Visual-semantic embeddings outperform unimodal embeddings by up to 12% in probing tasks.

02

Proposed framework reveals complementary information in text and image modalities.

03

Analysis tools help understand the inner workings of multimodal embeddings.

Abstract

Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion of probing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 12% increase in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dali-does/vse-probing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.