TL;DR
This paper introduces a differentiable framework for training models to produce faithful sentence-level rationales without supervision on the rationales, improving interpretability and maintaining competitive accuracy.
Contribution
It presents a novel training method that enables models to generate faithful rationales solely supervised on the target task, avoiding costly rationale annotations.
Findings
Achieves competitive performance with standard BERT models.
Outperforms pipeline-based rationale extraction in two datasets.
Enhances rationale quality through direct supervision.
Abstract
Evaluating the trustworthiness of a model's prediction is essential for differentiating between `right for the right reasons' and `right for the wrong reasons'. Identifying textual spans that determine the target label, known as faithful rationales, usually relies on pipeline approaches or reinforcement learning. However, such methods either require supervision and thus costly annotation of the rationales or employ non-differentiable models. We propose a differentiable training-framework to create models which output faithful rationales on a sentence level, by solely applying supervision on the target task. To achieve this, our model solves the task based on each rationale individually and learns to assign high scores to those which solved the task best. Our evaluation on three different datasets shows competitive results compared to a standard BERT blackbox while exceeding a pipeline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need
