What You See is What You Get: Principled Deep Learning via Distributional Generalization
Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jaros{\l}aw B{\l}asiok,, Preetum Nakkiran

TL;DR
This paper demonstrates that Differentially-Private training guarantees a desirable property called distributional generalization, enabling the design of deep learning methods that are robust, fair, and privacy-preserving, with provable guarantees and improved trade-offs.
Contribution
It introduces a novel connection between differential privacy and distributional generalization, providing new tools for designing deep learning algorithms with provable robustness and fairness.
Findings
DP training ensures distributional generalization.
New algorithms achieve state-of-the-art robustness and fairness.
Improved privacy-utility trade-offs in DP-SGD.
Abstract
Having similar behavior at training time and test time what we call a "What You See Is What You Get" (WYSIWYG) property is desirable in machine learning. Models trained with standard stochastic gradient descent (SGD), however, do not necessarily have this property, as their complex behaviors such as robustness or subgroup performance can differ drastically between training and test time. In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization. Applying this connection, we introduce new conceptual tools for designing deep-learning methods by reducing generalization concerns to optimization ones: to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the training data. By applying this novel design principle, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data
MethodsStochastic Gradient Descent · Distributional Generalization
