Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation
Abdelrahman Abouelenin, Mohamed Abdelrehim, Raffy Fahim, Amr Hendy, Mohamed Afify

TL;DR
This paper demonstrates how a scaled-down GPT2 transformer trained with differential privacy can improve next-word prediction accuracy in SwiftKey, balancing model size, speed, and privacy.
Contribution
It introduces a method to train a privacy-preserving transformer for language modeling with a two-stage process and efficient ONNX integration.
Findings
Small, consistent accuracy gains in next-word prediction.
Graceful trade-offs between model size, speed, and accuracy.
Effective use of differential privacy with a scaled-down GPT2.
Abstract
In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Malware Detection Techniques · Security and Verification in Computing
MethodsGated Recurrent Unit · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
