DP-XGBoost: Private Machine Learning at Scale

Nicolas Grislain; Joan Gonzalvez

arXiv:2110.12770·cs.LG·October 26, 2021

DP-XGBoost: Private Machine Learning at Scale

Nicolas Grislain, Joan Gonzalvez

PDF

Open Access

TL;DR

This paper introduces DP-XGBoost, a scalable and distributed differentially private implementation of the XGBoost machine learning model that significantly improves accuracy under privacy constraints.

Contribution

The paper presents the first scalable, distributed DP implementation of XGBoost, outperforming previous methods in accuracy for large datasets.

Findings

01

Outperforms previous DP models in accuracy for given privacy budgets

02

Scales to big data and runs in distributed environments like Kubernetes, Dask, Spark

03

Enables practical privacy-preserving machine learning at scale

Abstract

The big-data revolution announced ten years ago does not seem to have fully happened at the expected scale. One of the main obstacle to this, has been the lack of data circulation. And one of the many reasons people and organizations did not share as much as expected is the privacy risk associated with data sharing operations. There has been many works on practical systems to compute statistical queries with Differential Privacy (DP). There have also been practical implementations of systems to train Neural Networks with DP, but relatively little efforts have been dedicated to designing scalable classical Machine Learning (ML) models providing DP guarantees. In this work we describe and implement a DP fork of a battle tested ML model: XGBoost. Our approach beats by a large margin previous attempts at the task, in terms of accuracy achieved for a given privacy budget. It is also the only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Neural Networks and Applications