A Communication-Efficient Parallel Algorithm for Decision Tree
Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma and, Tie-Yan Liu

TL;DR
This paper introduces PV-Tree, a parallel decision tree algorithm that minimizes communication costs by using local and global voting, enabling scalable and efficient training with high accuracy.
Contribution
The paper presents PV-Tree, a novel parallel decision tree algorithm that reduces communication overhead and maintains near-optimal learning performance.
Findings
PV-Tree significantly reduces communication costs compared to existing methods.
PV-Tree achieves comparable or better accuracy on real-world datasets.
Theoretical analysis confirms PV-Tree's ability to find the best attribute with high probability.
Abstract
Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs. In this paper, we propose a new algorithm, called \emph{Parallel Voting Decision Tree (PV-Tree)}, to tackle this challenge. After partitioning the training data onto a number of (e.g., ) machines, this algorithm performs both local voting and global voting in each iteration. For local voting, the top- attributes are selected from each machine according to its local data. Then, globally top- attributes are determined by a majority voting among these local candidates. Finally, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
