A Review and Analysis of a Parallel Approach for Decision Tree Learning from Large Data Streams

Zeinab Shiralizadeh

arXiv:2505.11780·cs.AI·May 20, 2025

A Review and Analysis of a Parallel Approach for Decision Tree Learning from Large Data Streams

Zeinab Shiralizadeh

PDF

Open Access

TL;DR

This paper reviews the pdsCART parallel decision tree algorithm, emphasizing its real-time, scalable, and distributed processing capabilities for large data streams, and analyzes its performance within the MapReduce framework.

Contribution

It provides a comprehensive analysis of pdsCART, a parallel decision tree learning algorithm optimized for large-scale, streaming data in distributed environments.

Findings

01

Demonstrates scalability and efficiency of pdsCART in processing high-volume data streams.

02

Shows compatibility of the algorithm with MapReduce for distributed computing.

03

Highlights performance improvements over traditional decision tree methods.

Abstract

This work studies one of the parallel decision tree learning algorithms, pdsCART, designed for scalable and efficient data analysis. The method incorporates three core capabilities. First, it supports real-time learning from data streams, allowing trees to be constructed incrementally. Second, it enables parallel processing of high-volume streaming data, making it well-suited for large-scale applications. Third, the algorithm integrates seamlessly into the MapReduce framework, ensuring compatibility with distributed computing environments. In what follows, we present the algorithm's key components along with results highlighting its performance and scalability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Data Mining Algorithms and Applications · Time Series Analysis and Forecasting