TL;DR
This paper introduces Rationale^VT Transformer, a model that generates natural language explanations for complex visual reasoning tasks by integrating visual understanding at multiple levels with pretrained language models.
Contribution
It is the first to generate natural language rationales across multiple complex visual reasoning tasks using an integrated multi-level visual understanding approach.
Findings
Pretrained language models benefit from visual adaptation.
Free-text rationalization enhances interpretability in visual reasoning.
The approach improves explanation quality in visual question answering.
Abstract
Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question answering. The key challenge of accurate rationalization is comprehensive image understanding at all levels: not just their explicit content at the pixel level, but their contextual contents at the semantic and pragmatic levels. We present Rationale^VT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Interpretability · Adam · Byte Pair Encoding · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention
