On Imbalanced Regression with Hoeffding Trees
Pantia-Marina Alchirch, Dimitrios I. Diochnos

TL;DR
This paper extends kernel density estimation and hierarchical shrinkage techniques to streaming regression with Hoeffding trees, demonstrating improved early-stream performance through empirical evaluation on standard benchmarks.
Contribution
It introduces a telescoping formulation of KDE for streaming data and integrates HS into incremental trees, advancing imbalanced regression methods.
Findings
KDE improves early-stream prediction accuracy
HS provides limited performance gains in streaming settings
The methods are validated on standard online regression benchmarks
Abstract
Many real-world applications generate continuous data streams for regression. Hoeffding trees and their variants have a long-standing tradition due to their effectiveness, either alone or as base models in broader ensembles. Recent batch-learning work shows that kernel density estimation (KDE) improves smoothed predictions in imbalanced regression [Yang et al., 2021], while hierarchical shrinkage (HS) provides post-hoc regularization for decision trees without modifying their structure [Agarwal et al., 2022]. We extend KDE to streaming settings via a telescoping formulation and integrate HS into incremental decision trees. Empirical evaluation on standard online regression benchmarks shows that KDE consistently improves early-stream performance, whereas HS provides limited gains. Our implementation is publicly available at: https://github.com/marinaAlchirch/DSFA_2026.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Data Stream Mining Techniques · Explainable Artificial Intelligence (XAI)
