A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
Lijie Wang, Yaozong Shen, Shuyuan Peng, Shuai Zhang, Xinyan Xiao, Hao, Liu, Hongxuan Tang, Ying Chen, Hua Wu, Haifeng Wang

TL;DR
This paper introduces a comprehensive benchmark with annotated datasets and a new metric for evaluating the interpretability of neural NLP models across multiple tasks and languages.
Contribution
It provides a novel interpretability benchmark with token-level rationales and a consistency metric, enabling systematic evaluation of neural models and saliency methods.
Findings
Models show varying interpretability strengths and weaknesses.
The benchmark facilitates fair comparison of interpretability methods.
The new metric effectively measures rationale consistency.
Abstract
While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to precisely evaluate the interpretability, we provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We also design a new metric, i.e., the consistency between the rationales before and after perturbations, to uniformly evaluate the interpretability on different types of tasks. Based on this benchmark, we conduct experiments on three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling
