Forgetful Forests: high performance learning data structures for streaming data under concept drift
Zhehu Yuan, Yinqi Sun, Dennis Shasha

TL;DR
This paper introduces 'Forgetful Forests', a novel data structure for streaming data that efficiently adapts to concept drift, achieving high speed and accuracy in real-time machine learning applications.
Contribution
It presents a new 'forgetful' tree-based algorithm combining incremental computation and probabilistic filtering to handle concept drift effectively.
Findings
Up to 24 times faster than existing algorithms
Maintains high prediction accuracy with minimal loss
Suitable for high-volume streaming data applications
Abstract
Database research can help machine learning performance in many ways. One way is to design better data structures. This paper combines the use of incremental computation and sequential and probabilistic filtering to enable "forgetful" tree-based learning algorithms to cope with concept drift data (i.e., data whose function from input to classification changes over time). The forgetful algorithms described in this paper achieve high time performance while maintaining high quality predictions on streaming data. Specifically, the algorithms are up to 24 times faster than state-of-the-art incremental algorithms with at most a 2% loss of accuracy, or at least twice faster without any loss of accuracy. This makes such structures suitable for high volume streaming applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Air Quality Monitoring and Forecasting
