Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset

Anxiao Song; Shujie Cui; Jianli Bai; Ke Cheng; Yulong Shen; Giovanni Russello

arXiv:2507.20688·cs.CR·December 23, 2025

Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset

Anxiao Song, Shujie Cui, Jianli Bai, Ke Cheng, Yulong Shen, Giovanni Russello

PDF

TL;DR

Guard-GBDT is a novel framework that enables efficient, privacy-preserving gradient boosting decision tree training on vertical datasets by reducing communication and computational costs through approximation techniques.

Contribution

This work introduces Guard-GBDT, a new framework that improves efficiency and privacy in GBDT training on vertical datasets by approximating non-linear functions and compressing communication.

Findings

01

Outperforms state-of-the-art methods by up to 12.21x in communication efficiency.

02

Achieves comparable accuracy to plaintext XGBoost with only 1-2% deviation.

03

Reduces communication overhead significantly on LAN and WAN networks.

Abstract

In light of increasing privacy concerns and stringent legal regulations, using secure multiparty computation (MPC) to enable collaborative GBDT model training among multiple data owners has garnered significant attention. Despite this, existing MPC-based GBDT frameworks face efficiency challenges due to high communication costs and the computation burden of non-linear operations, such as division and sigmoid calculations. In this work, we introduce Guard-GBDT, an innovative framework tailored for efficient and privacy-preserving GBDT training on vertical datasets. Guard-GBDT bypasses MPC-unfriendly division and sigmoid functions by using more streamlined approximations and reduces communication overhead by compressing the messages exchanged during gradient aggregation. We implement a prototype of Guard-GBDT and extensively evaluate its performance and accuracy on various real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.