Invariant Rationalization
Shiyu Chang, Yang Zhang, Mo Yu, Tommi S. Jaakkola

TL;DR
This paper introduces an invariant rationalization method that uses a game-theoretic criterion to identify input features supporting predictions across diverse environments, reducing spurious correlations and improving interpretability.
Contribution
It proposes a novel invariant rationalization criterion based on game theory, enhancing robustness and interpretability over traditional mutual information approaches.
Findings
Reduces reliance on spurious correlations
Improves generalization across environments
Aligns better with human judgments
Abstract
Selective rationalization improves neural network interpretability by identifying a small subset of input features -- the rationale -- that best explains or supports the prediction. A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale. However, MMI can be problematic because it picks up spurious correlations between the input features and the output. Instead, we introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments. We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments. Our data and code are available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · AI-based Problem Solving and Planning
MethodsInterpretability
