Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?
Rachid Guerraoui, Nirupam Gupta, Rafa\"el Pinot, S\'ebastien Rouault,, John Stephan

TL;DR
This paper investigates whether it is feasible to combine differential privacy and Byzantine resilience in distributed SGD, revealing fundamental incompatibilities that hinder practical implementation for large models.
Contribution
It provides the first theoretical analysis showing the incompatibility of differential privacy and Byzantine resilience in distributed SGD, highlighting limitations for large-scale models.
Findings
Classical approaches to DP and Byzantine resilience are incompatible.
Combining these techniques unfavorably depends on model size.
Numerical experiments confirm the practical infeasibility for large models.
Abstract
This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and -Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theoretical point of view. A key finding of our analyses is that the classical approaches to these two (seemingly) orthogonal issues are incompatible. More precisely, we show that a direct composition of these techniques makes the guarantees of the resulting SGD algorithm depend unfavourably upon the number of parameters of the ML model, making the training of large models practically infeasible. We validate our theoretical results through numerical experiments on publicly-available datasets;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization
MethodsStochastic Gradient Descent
