Out-of-Distribution Generalization via Risk Extrapolation (REx)
David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang,, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, Aaron Courville

TL;DR
This paper introduces Risk Extrapolation (REx), a method to improve out-of-distribution generalization by minimizing risk variance across training domains, enhancing robustness to extreme distributional shifts including causal and anti-causal elements.
Contribution
The paper proposes REx and its variants as novel approaches to achieve robustness against diverse distributional shifts, including causal and covariate shifts, outperforming existing methods like Invariant Risk Minimization.
Findings
REx reduces sensitivity to extreme distributional shifts.
V-REx effectively penalizes risk variance across domains.
REx can recover causal mechanisms and improve out-of-distribution robustness.
Abstract
Distributional shift is one of the major obstacles when transferring machine learning prediction systems from the lab to the real world. To tackle this problem, we assume that variation across training domains is representative of the variation we might encounter at test time, but also that shifts at test time may be more extreme in magnitude. In particular, we show that reducing differences in risk across training domains can reduce a model's sensitivity to a wide range of extreme distributional shifts, including the challenging setting where the input contains both causal and anti-causal elements. We motivate this approach, Risk Extrapolation (REx), as a form of robust optimization over a perturbation set of extrapolated domains (MM-REx), and propose a penalty on the variance of training risks (V-REx) as a simpler variant. We prove that variants of REx can recover the causal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference
