ERASER: A Benchmark to Evaluate Rationalized NLP Models

Jay DeYoung; Sarthak Jain; Nazneen Fatema Rajani; Eric Lehman; Caiming; Xiong; Richard Socher; Byron C. Wallace

arXiv:1911.03429·cs.CL·April 27, 2020

ERASER: A Benchmark to Evaluate Rationalized NLP Models

Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming, Xiong, Richard Socher, Byron C. Wallace

PDF

2 Repos 1 Datasets

TL;DR

ERASER is a comprehensive benchmark dataset collection designed to evaluate the quality and faithfulness of rationales in interpretable NLP models, aiming to standardize progress in explainable AI.

Contribution

The paper introduces ERASER, a unified benchmark with datasets, metrics, and annotations for assessing rationales in NLP models, facilitating consistent evaluation of interpretability methods.

Findings

01

Multiple datasets with human-annotated rationales

02

Metrics for rationale alignment and faithfulness

03

Benchmark implementation available online

Abstract

State-of-the-art models in NLP are now predominantly based on deep neural networks that are opaque in terms of how they come to make predictions. This limitation has increased interest in designing more interpretable deep models for NLP that reveal the `reasoning' behind model outputs. But work in this direction has been conducted on different datasets and tasks with correspondingly unique aims and metrics; this makes it difficult to track progress. We propose the Evaluating Rationales And Simple English Reasoning (ERASER) benchmark to advance research on interpretable models in NLP. This benchmark comprises multiple datasets and tasks for which human annotations of "rationales" (supporting evidence) have been collected. We propose several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are (i.e.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

niurl/eraser_esnli
dataset· 165 dl
165 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.