Distributionally Robust Data Join

Pranjal Awasthi; Christopher Jung; Jamie Morgenstern

arXiv:2202.05797·cs.LG·June 16, 2023

Distributionally Robust Data Join

Pranjal Awasthi, Christopher Jung, Jamie Morgenstern

PDF

1 Repo

TL;DR

This paper proposes a distributionally robust learning framework that optimally combines labeled and unlabeled datasets with auxiliary features, accounting for potential distribution shifts and improving predictor robustness.

Contribution

It introduces a novel DRO-based method that handles two data sources with different distributions and auxiliary features, extending traditional DRO approaches.

Findings

01

The method effectively accounts for distributional differences between datasets.

02

It provides a principled way to incorporate unlabeled data with auxiliary features.

03

The approach improves robustness of predictors under distribution shifts.

Abstract

Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is $r_{1}$ away from the empirical distribution over the labeled dataset and $r_{2}$ away from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrisjung/distributionally-robust-data-join
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.