Visual Entailment: A Novel Task for Fine-Grained Image Understanding

Ning Xie; Farley Lai; Derek Doran; Asim Kadav

arXiv:1901.06706·cs.CV·January 23, 2019·162 cites

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

Ning Xie, Farley Lai, Derek Doran, Asim Kadav

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces Visual Entailment, a new image understanding task that assesses whether an image semantically entails a given text, supported by a new dataset and a model that outperforms existing baselines.

Contribution

The paper proposes the novel task of Visual Entailment, creates the SNLI-VE dataset based on existing resources, and develops the EVE model demonstrating improved accuracy and explainability.

Findings

01

EVE achieves up to 71% accuracy on VE task.

02

EVE outperforms several state-of-the-art VQA models.

03

The SNLI-VE dataset is publicly available.

Abstract

Existing visual reasoning datasets such as Visual Question Answering (VQA), often suffer from biases conditioned on the question, image or answer distributions. The recently proposed CLEVR dataset addresses these limitations and requires fine-grained reasoning but the dataset is synthetic and consists of similar objects and sentence structures across the dataset. In this paper, we introduce a new inference task, Visual Entailment (VE) - consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal of a trained VE model is to predict whether the image semantically entails the text. To realize this task, we build a dataset SNLI-VE based on the Stanford Natural Language Inference corpus and Flickr30k dataset. We evaluate various existing VQA baselines and build a model called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

necla-ml/SNLI-VE
noneOfficial

Datasets

SNUMPR/HFLB
dataset· 71 dl
71 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques