Lower Bounds for Public-Private Learning under Distribution Shift
Amrith Setlur, Pratiksha Thaker, Jonathan Ullman

TL;DR
This paper establishes lower bounds for public-private learning under distribution shift, showing that the benefit of combining data sources diminishes with increasing distribution divergence, especially in Gaussian models.
Contribution
It extends lower bounds for public-private learning to scenarios with significant distribution shift, covering Gaussian mean estimation and linear regression.
Findings
Small shift requires ample data from both sources.
Large shift renders public data ineffective.
Public-private data combination offers limited advantage under high distribution divergence.
Abstract
The most effective differentially private machine learning algorithms in practice rely on an additional source of purportedly public data. This paradigm is most interesting when the two sources combine to be more than the sum of their parts. However, there are settings such as mean estimation where we have strong lower bounds, showing that when the two data sources have the same distribution, there is no complementary value to combining the two data sources. In this work we extend the known lower bounds for public-private learning to setting where the two data sources exhibit significant distribution shift. Our results apply to both Gaussian mean estimation where the two distributions have different means, and to Gaussian linear regression where the two distributions exhibit parameter shift. We find that when the shift is small (relative to the desired accuracy), either public or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGlobal Educational Reforms and Inequalities
