Testing the effectiveness of saliency-based explainability in NLP using randomized survey-based experiments
Adel Rahimi, Shaurya Jain

TL;DR
This paper investigates how effective saliency-based explanations are in NLP by conducting randomized survey experiments, revealing that humans tend to accept explanations more readily than critically evaluating them.
Contribution
It introduces a novel survey-based experimental approach to assess human understanding of saliency explanations in NLP models.
Findings
Humans tend to accept saliency explanations with less critical scrutiny.
Saliency explanations may lead to overtrust in NLP model predictions.
Understanding human biases is crucial for improving explainability methods.
Abstract
As the applications of Natural Language Processing (NLP) in sensitive areas like Political Profiling, Review of Essays in Education, etc. proliferate, there is a great need for increasing transparency in NLP models to build trust with stakeholders and identify biases. A lot of work in Explainable AI has aimed to devise explanation methods that give humans insights into the workings and predictions of NLP models. While these methods distill predictions from complex models like Neural Networks into consumable explanations, how humans understand these explanations is still widely unexplored. Innate human tendencies and biases can handicap the understanding of these explanations in humans, and can also lead to them misjudging models and predictions as a result. We designed a randomized survey-based experiment to understand the effectiveness of saliency-based Post-hoc explainability methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling
