Era Splitting: Invariant Learning for Decision Trees
Timothy DeLise

TL;DR
This paper introduces two novel era-aware splitting criteria for decision trees, enabling better out-of-distribution generalization across different eras, with theoretical analysis and superior performance demonstrated on synthetic and real-world datasets.
Contribution
The paper develops and analyzes two new era-based splitting criteria for decision trees, extending OOD generalization methods beyond neural networks.
Findings
The new criteria improve OOD performance in decision trees.
The methods outperform state-of-the-art GBDT models on financial data.
The criteria are implemented in Scikit-Learn and publicly available.
Abstract
Real-life machine learning problems exhibit distributional shifts in the data from one time to another or from one place to another. This behavior is beyond the scope of the traditional empirical risk minimization paradigm, which assumes i.i.d. distribution of data over time and across locations. The emerging field of out-of-distribution (OOD) generalization addresses this reality with new theory and algorithms which incorporate "environmental", or "era-wise" information into the algorithms. So far, most research has been focused on linear models and/or neural networks . In this research we develop two new splitting criteria for decision trees, which allow us to apply ideas from OOD generalization research to decision tree models, namely, gradient boosting decision trees (GBDTs). The new splitting criteria use era-wise information associated with the data to grow tree-based models that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Load and Power Forecasting · Hydrological Forecasting Using AI · Machine Learning and Data Classification
MethodsBalanced Selection
