TL;DR
This paper compares human and neural language model importance patterns in English, finding strong correlation with saliency-based importance and suggesting saliency as a more cognitively plausible interpretability metric.
Contribution
It demonstrates that saliency-based importance in neural models aligns closely with human reading patterns, unlike attention-based methods.
Findings
Human reading fixations correlate with saliency-based importance
Saliency aligns better with human processing than attention
Saliency may be a more cognitively plausible interpretability metric
Abstract
Determining the relative importance of the elements in a sentence is a key factor for effortless natural language understanding. For human language processing, we can approximate patterns of relative importance by measuring reading fixations using eye-tracking technology. In neural language models, gradient-based saliency methods indicate the relative importance of a token for the target objective. In this work, we compare patterns of relative importance in English language processing by humans and models and analyze the underlying linguistic patterns. We find that human processing patterns in English correlate strongly with saliency-based importance in language models and not with attention-based importance. Our results indicate that saliency could be a cognitively more plausible metric for interpreting neural language models. The code is available on GitHub:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
