Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training

Marc Brinner; Sina Zarrie{\ss}

arXiv:2508.11393·cs.CL·August 18, 2025

Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training

Marc Brinner, Sina Zarrie{\ss}

PDF

1 Video

TL;DR

This paper introduces an end-to-end differentiable training method for rationalized transformer classifiers, enabling simultaneous classification and token relevance scoring, resulting in improved alignment with human annotations.

Contribution

It simplifies the rationalization process by using a single model for all roles, enhancing training stability and efficiency, and extends to produce class-wise rationales with state-of-the-art alignment.

Findings

01

Achieves stable end-to-end training of rationalized transformers.

02

Produces class-wise rationales with improved human annotation alignment.

03

Outperforms previous methods in rationale quality and training stability.

Abstract

We propose an end-to-end differentiable training paradigm for stable training of a rationalized transformer classifier. Our approach results in a single model that simultaneously classifies a sample and scores input tokens based on their relevance to the classification. To this end, we build on the widely-used three-player-game for training rationalized models, which typically relies on training a rationale selector, a classifier and a complement classifier. We simplify this approach by making a single model fulfill all three roles, leading to a more efficient training paradigm that is not susceptible to the common training instabilities that plague existing approaches. Further, we extend this paradigm to produce class-wise rationales while incorporating recent advances in parameterizing and regularizing the resulting rationales, thus leading to substantially improved and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training· underline