Differentially Private Distributed Learning for Language Modeling Tasks
Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov and, Alex Nevidomsky

TL;DR
This paper introduces a novel differentially private distributed fine-tuning method for language models that enhances prediction accuracy on user data, reduces communication costs, and maintains strong privacy guarantees.
Contribution
The paper presents a new technique for privacy-preserving distributed language model fine-tuning that outperforms existing methods in accuracy and efficiency.
Findings
Achieves 70% perplexity reduction on user language data
Improves keystroke saving rate by 8.7 percentage points
Provides a framework for evaluating differential privacy in distributed training
Abstract
One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm. In language modeling, users' language (e.g. in private messaging) could change in a year and be completely different from what we observe in publicly available data. At the same time, public data can be used for obtaining general knowledge (i.e. general model of English). We study approaches to distributed fine-tuning of a general model on user private data with the additional requirements of maintaining the quality on the general data and minimization of communication costs. We propose a novel technique that significantly improves prediction quality on users' language compared to a general model and outperforms gradient compression methods in terms of communication efficiency. The proposed procedure is fast and leads to an almost 70%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
