Multi-environment Invariance Learning with Missing Data
Yiran Jia, Jelena Bradic

TL;DR
This paper develops an invariance-based estimator for domain generalization that effectively handles missing outcome data, providing theoretical guarantees and demonstrating improved prediction accuracy in practical scenarios.
Contribution
It introduces a novel invariance learning estimator designed for missing outcomes, with theoretical analysis and empirical validation showing its robustness and effectiveness.
Findings
Estimator achieves lower prediction error despite biased imputation.
Theoretical guarantees on variable selection and error rates are established.
Empirical results on UCI Bike Sharing dataset validate the approach.
Abstract
Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging the inherent heterogeneity across environments to develop methods that provide causal explanations while enhancing robust prediction. However, in many practical scenarios, obtaining complete outcome data from each environment is challenging due to the high cost or complexity of data collection. This limitation in available data hinders the development of models that fully leverage environmental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
