Improving the performance of bagging ensembles for data streams through   mini-batching

Guilherme Cassales; Heitor Gomes; Albert Bifet; Bernhard Pfahringer,; Hermes Senger

arXiv:2112.09834·cs.LG·December 21, 2021

Improving the performance of bagging ensembles for data streams through mini-batching

Guilherme Cassales, Heitor Gomes, Albert Bifet, Bernhard Pfahringer,, Hermes Senger

PDF

TL;DR

This paper introduces a mini-batching approach to enhance the computational efficiency of ensemble algorithms in data stream mining, achieving up to 5X speedup with minimal impact on accuracy.

Contribution

It proposes a novel mini-batching strategy that improves cache locality and performance of ensemble methods in streaming data environments.

Findings

01

Up to 5X speedup on 8-core processors.

02

Significant reduction in cache misses.

03

Minor decrease in predictive accuracy.

Abstract

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data mining, stream processing algorithms have additional requirements regarding computational resources and adaptability to data evolution. They must process instances incrementally because the data's continuous flow prohibits storing data for multiple passes. Ensemble learning achieved remarkable predictive performance in this scenario. Implemented as a set of (several) individual classifiers, ensembles are naturally amendable for task parallelism. However, the incremental learning and dynamic data structures used to capture the concept drift increase the cache misses and hinder the benefit of parallelism. This paper proposes a mini-batching strategy that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.