ER-Test: Evaluating Explanation Regularization Methods for Language Models
Brihi Joshi, Aaron Chan, Ziyi Liu, Shaoliang Nie, Maziar Sanjabi,, Hamed Firooz, Xiang Ren

TL;DR
ER-Test provides a comprehensive framework to evaluate how explanation regularization improves language models' out-of-distribution generalization, highlighting its benefits beyond in-distribution performance.
Contribution
This work introduces ER-Test, a new evaluation framework for assessing ER methods' impact on OOD generalization in language models, with extensive analysis across multiple datasets and tasks.
Findings
ER improves OOD performance significantly
ER has minimal effect on in-distribution accuracy
Limited rationale supervision still enhances OOD generalization
Abstract
By explaining how humans would solve a given task, human rationales can provide strong learning signal for neural language models (LMs). Explanation regularization (ER) aims to improve LM generalization by pushing the LM's machine rationales (Which input tokens did the LM focus on?) to align with human rationales (Which input tokens would humans focus on?). Though prior works primarily study ER via in-distribution (ID) evaluation, out-of-distribution (OOD) generalization is often more critical in real-world scenarios, yet ER's effect on OOD generalization has been underexplored. In this paper, we introduce ER-Test, a framework for evaluating ER models' OOD generalization along three dimensions: unseen dataset tests, contrast set tests, and functional tests. Using ER-Test, we extensively analyze how ER models' OOD generalization varies with different ER design choices. Across two tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
MethodsALIGN
