Attention in Reasoning: Dataset, Analysis, and Modeling
Shi Chen, Ming Jiang, Jinhui Yang, Qi Zhao

TL;DR
This paper introduces the AiR framework that evaluates and enhances attention mechanisms in neural networks by analyzing reasoning processes, using human data, and supervising attention learning to improve visual question answering performance.
Contribution
It proposes a novel evaluation metric for attention based on reasoning steps, and introduces supervision techniques to improve attention and reasoning in VQA models.
Findings
Attention with reasoning improves task performance.
Supervised attention learning enhances reasoning capability.
Analysis of human and machine attention mechanisms reveals key differences.
Abstract
While attention has been an increasingly popular component in deep neural networks to both interpret and boost the performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability and how they impact task performance. To improve the attention and reasoning ability of visual question answering models, we propose to supervise the learning of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling
