Learning Visually-Grounded Semantics from Contrastive Adversarial   Samples

Haoyue Shi; Jiayuan Mao; Tete Xiao; Yuning Jiang; Jian Sun

arXiv:1806.10348·cs.CL·June 28, 2018·24 cites

Learning Visually-Grounded Semantics from Contrastive Adversarial Samples

Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun

PDF

Open Access 1 Repo

TL;DR

This paper enhances visual-semantic embeddings by augmenting datasets with contrastive adversarial samples, improving grounding accuracy and robustness against attacks, thus advancing the connection between textual semantics and visual concepts.

Contribution

It introduces a novel data augmentation method using linguistically-informed contrastive adversarial samples to improve visual-semantic grounding models.

Findings

01

Significant performance improvement on downstream tasks

02

Enhanced robustness against adversarial attacks

03

Better grounding of textual semantics to visual concepts

Abstract

We study the problem of grounding distributional representations of texts on the visual domain, namely visual-semantic embeddings (VSE for short). Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of current frameworks and image-text datasets (e.g., MS-COCO) both quantitatively and qualitatively. The large gap between the number of possible constitutions of real-world semantics and the size of parallel data, to a large extent, restricts the model to establish the link between textual semantics and visual concepts. We alleviate this problem by augmenting the MS-COCO image captioning datasets with textual contrastive adversarial samples. These samples are synthesized using linguistic rules and the WordNet knowledge base. The construction procedure is both syntax- and semantics-aware. The samples enforce the model to ground learned embeddings to concrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ExplorerFreda/VSE-C
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling