TL;DR
This paper proposes a distributionally robust learning framework that optimally combines labeled and unlabeled datasets with auxiliary features, accounting for potential distribution shifts and improving predictor robustness.
Contribution
It introduces a novel DRO-based method that handles two data sources with different distributions and auxiliary features, extending traditional DRO approaches.
Findings
The method effectively accounts for distributional differences between datasets.
It provides a principled way to incorporate unlabeled data with auxiliary features.
The approach improves robustness of predictors under distribution shifts.
Abstract
Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is away from the empirical distribution over the labeled dataset and away from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
