Maximum Mean Discrepancy for Generalization in the Presence of Distribution and Missingness Shift
Liwen Ouyang, Aaron Key

TL;DR
This paper introduces methods to improve model generalization under covariate shifts by minimizing Maximum Mean Discrepancy (MMD) between training and test data, addressing distribution and missingness shifts.
Contribution
It proposes three novel techniques—MMD Representation, MMD Mask, and MMD Hybrid—that effectively handle different types of covariate shifts to enhance model robustness.
Findings
Models with MMD loss perform better on shifted test data.
MMD methods improve calibration and extrapolation.
Techniques adapt features to reduce distribution mismatch.
Abstract
Covariate shifts are a common problem in predictive modeling on real-world problems. This paper proposes addressing the covariate shift problem by minimizing Maximum Mean Discrepancy (MMD) statistics between the training and test sets in either feature input space, feature representation space, or both. We designed three techniques that we call MMD Representation, MMD Mask, and MMD Hybrid to deal with the scenarios where only a distribution shift exists, only a missingness shift exists, or both types of shift exist, respectively. We find that integrating an MMD loss component helps models use the best features for generalization and avoid dangerous extrapolation as much as possible for each test sample. Models treated with this MMD approach show better performance, calibration, and extrapolation on the test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Machine Learning and Data Classification
