Improving the performance of bagging ensembles for data streams through mini-batching
Guilherme Cassales, Heitor Gomes, Albert Bifet, Bernhard Pfahringer,, Hermes Senger

TL;DR
This paper introduces a mini-batching approach to enhance the computational efficiency of ensemble algorithms in data stream mining, achieving up to 5X speedup with minimal impact on accuracy.
Contribution
It proposes a novel mini-batching strategy that improves cache locality and performance of ensemble methods in streaming data environments.
Findings
Up to 5X speedup on 8-core processors.
Significant reduction in cache misses.
Minor decrease in predictive accuracy.
Abstract
Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data mining, stream processing algorithms have additional requirements regarding computational resources and adaptability to data evolution. They must process instances incrementally because the data's continuous flow prohibits storing data for multiple passes. Ensemble learning achieved remarkable predictive performance in this scenario. Implemented as a set of (several) individual classifiers, ensembles are naturally amendable for task parallelism. However, the incremental learning and dynamic data structures used to capture the concept drift increase the cache misses and hinder the benefit of parallelism. This paper proposes a mini-batching strategy that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
