Chunk-aware Alignment and Lexical Constraint for Visual Entailment with   Natural Language Explanations

Qian Yang; Yunxin Li; Baotian Hu; Lin Ma; Yuxing Ding and; Min Zhang

arXiv:2207.11401·cs.CL·December 5, 2022

Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations

Qian Yang, Yunxin Li, Baotian Hu, Lin Ma, Yuxing Ding and, Min Zhang

PDF

1 Repo

TL;DR

This paper introduces CALeC, a novel method for visual entailment with natural language explanations that enhances semantic alignment and explanation faithfulness by leveraging chunk-aware and lexical constraints.

Contribution

It proposes a unified framework with chunk-aware semantic alignment and lexical constraints to improve reasoning and explanation quality in visual entailment tasks.

Findings

01

CALeC outperforms existing models in inference accuracy.

02

It generates more faithful and informative explanations.

03

Experimental results on three datasets validate its effectiveness.

Abstract

Visual Entailment with natural language explanations aims to infer the relationship between a text-image pair and generate a sentence to explain the decision-making process. Previous methods rely mainly on a pre-trained vision-language model to perform the relation inference and a language model to generate the corresponding explanation. However, the pre-trained vision-language models mainly build token-level alignment between text and image yet ignore the high-level semantic alignment between the phrases (chunks) and visual contents, which is critical for vision-language reasoning. Moreover, the explanation generator based only on the encoded joint representation does not explicitly consider the critical decision-making points of relation inference. Thus the generated explanations are less faithful to visual-language reasoning. To mitigate these problems, we propose a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HITsz-TMG/ExplainableVisualEntailment
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.