Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?

Rachid Guerraoui; Nirupam Gupta; Rafa\"el Pinot; S\'ebastien Rouault,; John Stephan

arXiv:2102.08166·cs.LG·June 25, 2021

Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?

Rachid Guerraoui, Nirupam Gupta, Rafa\"el Pinot, S\'ebastien Rouault,, John Stephan

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether it is feasible to combine differential privacy and Byzantine resilience in distributed SGD, revealing fundamental incompatibilities that hinder practical implementation for large models.

Contribution

It provides the first theoretical analysis showing the incompatibility of differential privacy and Byzantine resilience in distributed SGD, highlighting limitations for large-scale models.

Findings

01

Classical approaches to DP and Byzantine resilience are incompatible.

02

Combining these techniques unfavorably depends on model size.

03

Numerical experiments confirm the practical infeasibility for large models.

Abstract

This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α, f)$ -Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theoretical point of view. A key finding of our analyses is that the classical approaches to these two (seemingly) orthogonal issues are incompatible. More precisely, we show that a direct composition of these techniques makes the guarantees of the resulting SGD algorithm depend unfavourably upon the number of parameters of the ML model, making the training of large models practically infeasible. We validate our theoretical results through numerical experiments on publicly-available datasets;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LPD-EPFL/DifferentialByzantine
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization

MethodsStochastic Gradient Descent