TL;DR
This paper investigates optimal hyperparameter choices for differentially private transfer learning, revealing mismatches between theory and practice and analyzing how clipping bounds and batch sizes affect privacy and performance.
Contribution
It uncovers the mismatch between theoretical guidelines and empirical results for hyperparameter tuning in DP transfer learning and offers insights into better selection strategies.
Findings
Larger clipping bounds perform better under strong privacy, contrary to theoretical expectations.
Existing heuristics for batch size tuning are ineffective under fixed compute budgets.
Using a single hyperparameter setting across tasks can lead to suboptimal performance.
Abstract
Differentially private (DP) transfer learning, i.e., fine-tuning a pretrained model on private data, is the current state-of-the-art approach for training large models under privacy constraints. We focus on two key hyperparameters in this setting: the clipping bound and batch size . We show a clear mismatch between the current theoretical understanding of how to choose an optimal (stronger privacy requires smaller ) and empirical outcomes (larger performs better under strong privacy), caused by changes in the gradient distributions. Assuming a limited compute budget (fixed epochs), we demonstrate that the existing heuristics for tuning do not work, while cumulative DP noise better explains whether smaller or larger batches perform better. We also highlight how the common practice of using a single setting across tasks can lead to suboptimal performance. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
