Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest
Martin Khannouz, Tristan Glatard

TL;DR
This paper introduces a method for dynamically adjusting the ensemble size of a memory-constrained Mondrian forest to optimize performance on data streams, especially in limited-memory environments.
Contribution
It presents an algorithm to estimate overfitting and determine the optimal ensemble size under memory constraints, improving performance on streaming data.
Findings
Achieves up to 95% of optimal performance on stable datasets
Outperforms fixed-size forests on datasets with concept drift
Demonstrates effectiveness in real and simulated data streams
Abstract
Supervised learning algorithms generally assume the availability of enough memory to store data models during the training and test phases. However, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. Such memory constraints impact the model behavior and assumptions. In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance. In particular, we experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams and we design an algorithm to guide the forest toward that optimal number by using an estimation of overfitting. We tested different variations for this algorithm on a variety of real and simulated datasets, and we conclude that our method can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · Machine Learning and Data Classification
MethodsLib · Test
