Learning Invariant Representations with Missing Data

Mark Goldstein; J\"orn-Henrik Jacobsen; Olina Chau; Adriel Saporta,; Aahlad Puli; Rajesh Ranganath; Andrew C. Miller

arXiv:2112.00881·cs.LG·June 10, 2022

Learning Invariant Representations with Missing Data

Mark Goldstein, J\"orn-Henrik Jacobsen, Olina Chau, Adriel Saporta,, Aahlad Puli, Rajesh Ranganath, Andrew C. Miller

PDF

Open Access 1 Repo

TL;DR

This paper develops methods to learn invariant data representations even when some nuisance variables are missing, ensuring better generalization across different test distributions.

Contribution

It introduces MMD estimators for invariance objectives that work with missing nuisance data, enabling robust model training without complete nuisance observations.

Findings

01

Achieves test performance comparable to full-data methods in simulations.

02

Effectively enforces invariance despite missing nuisance variables.

03

Demonstrates applicability on clinical data.

Abstract

Spurious correlations allow flexible models to predict well during training but poorly on related test distributions. Recent work has shown that models that satisfy particular independencies involving correlation-inducing \textit{nuisance} variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such as demographics or image background labels, are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. Here we derive \acrshort{mmd} estimators used for invariance objectives under missing nuisances. On simulations and clinical data, optimizing through these estimates achieves test performance similar to using estimators that make use of the full data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marikgoldstein/missing-mmd
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Statistical Methods and Inference · Machine Learning and Data Classification