Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension
Ekta Sood, Simon Tannert, Diego Frassinelli, Andreas Bulling, Ngoc, Thang Vu

TL;DR
This study investigates how neural attention mechanisms in NLP models relate to human visual attention by using eye-tracking data, revealing that different architectures learn distinct attention strategies.
Contribution
The paper introduces a new eye-tracking dataset and compares neural attention with human attention across various models in machine reading comprehension.
Findings
LSTM and CNN models show higher similarity to human attention correlating with performance.
XLNet performs best overall but does not align closely with human attention.
Different neural architectures develop distinct attention strategies.
Abstract
While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset - MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Label Smoothing
