Rationalization through Concepts

Diego Antognini; Boi Faltings

arXiv:2105.04837·cs.CL·May 12, 2021

Rationalization through Concepts

Diego Antognini, Boi Faltings

PDF

TL;DR

ConRAT is a novel self-interpretable model that extracts human-aligned concepts from text and explains predictions through a linear combination of these concepts, improving interpretability and performance.

Contribution

It introduces ConRAT, a new model that generates concept-based explanations aligned with human rationalization using only overall labels.

Findings

01

ConRAT produces concepts that align with human rationalizations.

02

It outperforms state-of-the-art methods on sentiment classification.

03

ConRAT enhances interpretability without requiring aspect-specific labels.

Abstract

Automated predictions require explanations to be interpretable by humans. One type of explanation is a rationale, i.e., a selection of input features such as relevant text snippets from which the model computes the outcome. However, a single overall selection does not provide a complete explanation, e.g., weighing several aspects for decisions. To this end, we present a novel self-interpretable model called ConRAT. Inspired by how human explanations for high-level decisions are often based on key concepts, ConRAT extracts a set of text snippets as concepts and infers which ones are described in the document. Then, it explains the outcome with a linear aggregation of concepts. Two regularizers drive ConRAT to build interpretable concepts. In addition, we propose two techniques to boost the rationale and predictive performance further. Experiments on both single- and multi-aspect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.