DP-XGBoost: Private Machine Learning at Scale
Nicolas Grislain, Joan Gonzalvez

TL;DR
This paper introduces DP-XGBoost, a scalable and distributed differentially private implementation of the XGBoost machine learning model that significantly improves accuracy under privacy constraints.
Contribution
The paper presents the first scalable, distributed DP implementation of XGBoost, outperforming previous methods in accuracy for large datasets.
Findings
Outperforms previous DP models in accuracy for given privacy budgets
Scales to big data and runs in distributed environments like Kubernetes, Dask, Spark
Enables practical privacy-preserving machine learning at scale
Abstract
The big-data revolution announced ten years ago does not seem to have fully happened at the expected scale. One of the main obstacle to this, has been the lack of data circulation. And one of the many reasons people and organizations did not share as much as expected is the privacy risk associated with data sharing operations. There has been many works on practical systems to compute statistical queries with Differential Privacy (DP). There have also been practical implementations of systems to train Neural Networks with DP, but relatively little efforts have been dedicated to designing scalable classical Machine Learning (ML) models providing DP guarantees. In this work we describe and implement a DP fork of a battle tested ML model: XGBoost. Our approach beats by a large margin previous attempts at the task, in terms of accuracy achieved for a given privacy budget. It is also the only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
