An Investigation of Language Model Interpretability via Sentence Editing

Samuel Stevens; Yu Su

arXiv:2011.14039·cs.CL·September 28, 2021

An Investigation of Language Model Interpretability via Sentence Editing

Samuel Stevens, Yu Su

PDF

Open Access 2 Repos

TL;DR

This paper explores the interpretability of pre-trained language models using a sentence editing dataset, revealing that attention weights correlate well with human rationales and outperform gradient-based methods.

Contribution

It introduces a novel interpretability testbed based on sentence editing data, enabling systematic analysis of PLMs' interpretability and rationale extraction methods.

Findings

01

Attention weights correlate well with human rationales.

02

Attention outperforms gradient-based saliency in rationale extraction.

03

The dataset and code are publicly available for future research.

Abstract

Pre-trained language models (PLMs) like BERT are being used for almost all language-related tasks, but interpreting their behavior still remains a significant challenge and many important questions remain largely unanswered. In this work, we re-purpose a sentence editing dataset, where faithful high-quality human rationales can be automatically extracted and compared with extracted model rationales, as a new testbed for interpretability. This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability, including the role of pre-training procedure, comparison of rationale extraction methods, and different layers in the PLM. The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales and work better than gradient-based saliency in extracting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsLinear Layer · Interpretability · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Residual Connection · Attention Dropout · Weight Decay · Attention Is All You Need · Multi-Head Attention