Towards Transparent AI Systems: Interpreting Visual Question Answering Models
Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra

TL;DR
This paper investigates interpretability in Visual Question Answering models by visualizing input focus areas using guided backpropagation and occlusion, revealing implicit attention mechanisms.
Contribution
It introduces visualization techniques to interpret VQA models and analyzes their focus on relevant image regions and question words without explicit attention modules.
Findings
VQA models sometimes implicitly attend to relevant image regions.
Models often focus on appropriate words in questions.
Interpretability methods reveal implicit attention mechanisms.
Abstract
Deep neural networks have shown striking progress and obtained state-of-the-art results in many AI research fields in the recent years. However, it is often unsatisfying to not know why they predict what they do. In this paper, we address the problem of interpreting Visual Question Answering (VQA) models. Specifically, we are interested in finding what part of the input (pixels in images or words in questions) the VQA model focuses on while answering the question. To tackle this problem, we use two visualization techniques -- guided backpropagation and occlusion -- to find important words in the question and important regions in the image. We then present qualitative and quantitative analyses of these importance maps. We found that even without explicit attention mechanisms, VQA models may sometimes be implicitly attending to relevant regions in the image, and often to appropriate words…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
