FERRET: Private Deep Learning Faster And Better Than DPSGD
David Zagardo

TL;DR
FERRET introduces a privacy-preserving deep learning method using 1-bit gradient compression with Bernoulli masking, achieving faster training, better utility, and formal privacy guarantees without additive noise, outperforming DPSGD.
Contribution
FERRET presents a novel MI-DP framework with 1-bit gradient compression and Bernoulli masking, enabling faster training with strong privacy guarantees and improved utility over existing methods.
Findings
FERRET outperforms DPSGD in perplexity across multiple models and epochs.
FERRET achieves formal MI-DP guarantees without additive noise.
FERRET trains up to 5 times faster than DPSGD while maintaining privacy and utility.
Abstract
We revisit 1-bit gradient compression through the lens of mutual-information differential privacy (MI-DP). Building on signSGD, we propose FERRET--Fast and Effective Restricted Release for Ethical Training--which transmits at most one sign bit per parameter group with Bernoulli masking. Theory: We prove each fired group leaks at most ln 2 nats; after subsampling with rate s, the total privacy loss of G groups trained for T steps with firing probability p is epsilon = G * T * s * p * ln 2. Thus FERRET achieves MI-DP for epsilon in [0.1, 2] without additive noise. Practice: We evaluate three granularities--FERRET-MAX (finest), FERRET-EIGHTH (medium), and FERRET-2 (coarsest)--on five LLMs (137M-1.8B parameters) against DPSGD and Non-DP baselines. All methods trained for 1, 3, and 5 epochs. Utility: Across all settings, FERRET-MAX/EIGHTH beat DPSGD's perplexity. At epsilon=0.5, 5…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
