Visuallly Grounded Generation of Entailments from Premises

Somaye Jafaritazehjani; Albert Gatt; Marc Tanti

arXiv:1909.09788·cs.CL·September 24, 2019

Visuallly Grounded Generation of Entailments from Premises

Somaye Jafaritazehjani, Albert Gatt, Marc Tanti

PDF

Open Access

TL;DR

This paper explores generating hypotheses from visual premises for natural language inference, demonstrating that multimodal models grounded in visual information can effectively produce entailments, with marginal improvements over unimodal models.

Contribution

It introduces a novel generation-based approach to NLI using visual grounding and compares multimodal and unimodal neural architectures for this task.

Findings

01

Multimodal models outperform unimodal models in entailment generation.

02

Generated hypotheses are evaluated successfully through automatic and human assessments.

03

Grounding textual premises in visual information benefits hypothesis generation.

Abstract

Natural Language Inference (NLI) is the task of determining the semantic relationship between a premise and a hypothesis. In this paper, we focus on the {\em generation} of hypotheses from premises in a multimodal setting, to generate a sentence (hypothesis) given an image and/or its description (premise) as the input. The main goals of this paper are (a) to investigate whether it is reasonable to frame NLI as a generation task; and (b) to consider the degree to which grounding textual premises in visual information is beneficial to generation. We compare different neural architectures, showing through automatic and human evaluation that entailments can indeed be generated successfully. We also show that multimodal models outperform unimodal models in this task, albeit marginally.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques