Out of Distribution Generalization in Machine Learning
Martin Arjovsky

TL;DR
This paper addresses the challenge of out-of-distribution generalization in machine learning, proposing formal definitions, assumptions, and simple algorithms that leverage causal structures to improve model robustness in varied real-world scenarios.
Contribution
It introduces a formal framework for out-of-distribution problems, explores assumptions for reliable generalization, and links causal discovery to robust feature selection.
Findings
Proposes simple algorithms for better out-of-distribution generalization.
Establishes a connection between causal structure discovery and model robustness.
Provides formal definitions and assumptions for out-of-distribution problems.
Abstract
Machine learning has achieved tremendous success in a variety of domains in recent years. However, a lot of these success stories have been in places where the training and the testing distributions are extremely similar to each other. In everyday situations when models are tested in slightly different data than they were trained on, ML algorithms can fail spectacularly. This research attempts to formally define this problem, what sets of assumptions are reasonable to make in our data and what kind of guarantees we hope to obtain from them. Then, we focus on a certain class of out of distribution problems, their assumptions, and introduce simple algorithms that follow from these assumptions that are able to provide more reliable generalization. A central topic in the thesis is the strong link between discovering the causal structure of the data, finding features that are reliable (when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Bayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference
