Explanation Regularisation through the Lens of Attributions

Pedro Ferreira; Ivan Titov; Wilker Aziz

arXiv:2407.16693·cs.CL·February 6, 2025

Explanation Regularisation through the Lens of Attributions

Pedro Ferreira, Ivan Titov, Wilker Aziz

PDF

Open Access 1 Repo

TL;DR

This paper investigates explanation regularisation (ER) in text classifiers, revealing that increased reliance on plausible tokens does not necessarily improve out-of-domain performance, challenging previous assumptions about ER's benefits.

Contribution

The study critically examines the relationship between ER, reliance on plausible features, and OOD performance, highlighting that stronger reliance on plausible tokens is not the main factor for OOD improvements.

Findings

01

Stronger reliance on plausible tokens does not correlate with better OOD performance.

02

The connection between ER guidance and reliance on plausible features has been overstated.

03

ER's benefits in OOD settings may not stem from increased reliance on human-annotated rationales.

Abstract

Explanation regularisation (ER) has been introduced as a way to guide text classifiers to form their predictions relying on input tokens that humans consider plausible. This is achieved by introducing an auxiliary explanation loss that measures how well the output of an input attribution technique for the model agrees with human-annotated rationales. The guidance appears to benefit performance in out-of-domain (OOD) settings, presumably due to an increased reliance on "plausible" tokens. However, previous work has under-explored the impact of guidance on that reliance, particularly when reliance is measured using attribution techniques different from those used to guide the model. In this work, we seek to close this gap, and also explore the relationship between reliance on plausible features and OOD performance. We find that the connection between ER and the ability of a classifier to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PedroMLF/ER_through_the_lens_of_attributions
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSeismology and Earthquake Studies