Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

Ziyue Huang; Ke Yi

arXiv:1909.07633·cs.DS·September 18, 2019

Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

Ziyue Huang, Ke Yi

PDF

Open Access

TL;DR

This paper introduces two communication-efficient methods for distributed GBDT training, reducing overhead by using weighted sampling for information gain estimation and protocols for weighted quantile computation, enhancing scalability.

Contribution

The paper presents novel distributed protocols for weighted sampling and quantile estimation tailored for GBDT, improving communication efficiency in large-scale distributed learning.

Findings

01

Reduced communication overhead in distributed GBDT training

02

Efficient estimation of information gain with small data subsets

03

Improved scalability for large datasets

Abstract

Gradient boosting decision tree (GBDT) is a powerful and widely-used machine learning model, which has achieved state-of-the-art performance in many academic areas and production environment. However, communication overhead is the main bottleneck in distributed training which can handle the massive data nowadays. In this paper, we propose two novel communication-efficient methods over distributed dataset to mitigate this problem, a weighted sampling approach by which we can estimate the information gain over a small subset efficiently, and distributed protocols for weighted quantile problem used in approximate tree learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Bayesian Modeling and Causal Inference · Statistical Methods and Inference