Nearly Optimal Bounds for Computing Decision Tree Splits in Data Streams
Hoang Ta, Hoa T. Vu

TL;DR
This paper presents nearly optimal one-pass algorithms and matching lower bounds for decision tree split approximation in data streams, improving space complexity for regression and classification tasks.
Contribution
It introduces new space-efficient algorithms with tight bounds for approximating decision tree splits in data streams, leveraging Lipschitz properties and sketching techniques.
Findings
Optimal space bounds for regression split approximation are established.
Improved space complexity for classification Gini split approximation is achieved.
Matching lower bounds confirm the tightness of the proposed algorithms.
Abstract
We establish nearly optimal upper and lower bounds for approximating decision tree splits in data streams. For regression with labels in the range , we give a one-pass algorithm using space that outputs a split within additive error of the optimal split, improving upon the two-pass algorithm of Pham et al. (ISIT 2025). Furthermore, we provide a matching one-pass lower bound showing that space is indeed necessary. For classification, we also obtain a one-pass algorithm using space for approximating the optimal Gini split, improving upon the previous -space algorithm. We complement these results with matching space lower bounds: for Gini impurity and for misclassification (which matches the upper bound obtained by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
