An Interpretability Evaluation Benchmark for Pre-trained Language Models
Yaozong Shen, Lijie Wang, Ying Chen, Xinyan Xiao, Jing Liu, Hua Wu

TL;DR
This paper introduces a comprehensive benchmark for evaluating pre-trained language models across multiple interpretability dimensions, including grammar, semantics, knowledge, reasoning, and computation, with annotated rationales and perturbation-based faithfulness metrics.
Contribution
It provides the first multi-dimensional interpretability evaluation benchmark with token-level rationales and perturbation-based faithfulness metrics for pre-trained language models.
Findings
Pre-trained LMs perform poorly on knowledge and computation dimensions.
Models show low plausibility in interpretability across all dimensions.
Models lack robustness on syntax-aware data.
Abstract
While pre-trained language models (LMs) have brought great improvements in many NLP tasks, there is increasing attention to explore capabilities of LMs and interpret their predictions. However, existing works usually focus only on a certain capability with some downstream tasks. There is a lack of datasets for directly evaluating the masked word prediction performance and the interpretability of pre-trained LMs. To fill in the gap, we propose a novel evaluation benchmark providing with both English and Chinese annotated data. It tests LMs abilities in multiple dimensions, i.e., grammar, semantics, knowledge, reasoning and computation. In addition, it provides carefully annotated token-level rationales that satisfy sufficiency and compactness. It contains perturbed instances for each original instance, so as to use the rationale consistency under perturbations as the metric for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
