Extremely Simple Streaming Forest
Haoyin Xu, Jayanta Dey, Sambit Panda, Joshua T. Vogelstein

TL;DR
The paper introduces XForest, a straightforward streaming decision forest method that incrementally updates trees with new data, often matching or surpassing batch algorithms in accuracy and efficiency across diverse datasets.
Contribution
Proposes a simple extension to decision trees for streaming data, replacing old trees with new ones, improving accuracy and efficiency over existing complex methods.
Findings
XForest performs comparably or better than batch algorithms on 72 classification problems.
The zero-added-node extension allows efficient transfer to new tasks using only inference.
XForest offers a simple, effective standard for streaming decision forests.
Abstract
Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, especially on tabular data. However, most of the current implementations only operate in batch mode, and therefore cannot incrementally update when more data arrive. Several previous works developed streaming trees and ensembles to overcome this limitation. Nonetheless, we found that those state-of-the-art algorithms suffer from a number of drawbacks, including low accuracy on some problems and high memory usage on others. We therefore developed an extremely simple extension of decision trees: given new data, simply update existing trees by continuing to grow them, and replace some old trees with new ones to control the total number of trees. In a benchmark suite containing 72 classification problems (the OpenML-CC18 data suite), we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Data Mining Algorithms and Applications
