e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations
Virginie Do, Oana-Maria Camburu, Zeynep Akata, Thomas Lukasiewicz

TL;DR
This paper corrects label errors in the SNLI-VE dataset, evaluates model performance on the corrected data, and introduces e-SNLI-VE with natural language explanations to improve multimodal reasoning.
Contribution
It provides a corrected version of SNLI-VE, evaluates models on this improved dataset, and introduces natural language explanations for enhanced interpretability.
Findings
Improved dataset accuracy after correction.
Models trained with explanations outperform baseline models.
Enhanced interpretability with human-written natural language explanations.
Abstract
The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning. However, the automatic way in which SNLI-VE has been assembled (via combining parts of two related datasets) gives rise to a large number of errors in the labels of this corpus. In this paper, we first present a data collection effort to correct the class with the highest error rate in SNLI-VE. Secondly, we re-evaluate an existing model on the corrected corpus, which we call SNLI-VE-2.0, and provide a quantitative comparison with its performance on the non-corrected corpus. Thirdly, we introduce e-SNLI-VE, which appends human-written natural language explanations to SNLI-VE-2.0. Finally, we train models that learn from these explanations at training time, and output such explanations at testing time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
