Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation

Abdelrahman Abouelenin; Mohamed Abdelrehim; Raffy Fahim; Amr Hendy; Mohamed Afify

arXiv:2505.05648·cs.CL·May 12, 2025

Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation

Abdelrahman Abouelenin, Mohamed Abdelrehim, Raffy Fahim, Amr Hendy, Mohamed Afify

PDF

Open Access

TL;DR

This paper demonstrates how a scaled-down GPT2 transformer trained with differential privacy can improve next-word prediction accuracy in SwiftKey, balancing model size, speed, and privacy.

Contribution

It introduces a method to train a privacy-preserving transformer for language modeling with a two-stage process and efficient ONNX integration.

Findings

01

Small, consistent accuracy gains in next-word prediction.

02

Graceful trade-offs between model size, speed, and accuracy.

03

Effective use of differential privacy with a scaled-down GPT2.

Abstract

In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Malware Detection Techniques · Security and Verification in Computing

MethodsGated Recurrent Unit · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings