# ToPs: Ensemble Learning with Trees of Predictors

**Authors:** Jinsung Yoon, William R. Zame, Mihaela van der Schaar

arXiv: 1706.01396 · 2018-04-04

## TL;DR

ToPs constructs a tree-based ensemble of predictors, optimizing splits and models at each node to adaptively match dataset characteristics, leading to statistically significant improvements over existing methods.

## Contribution

We introduce a novel ensemble method that builds a tree of predictors with joint optimization of splits and models, enhancing adaptability and performance.

## Key findings

- Statistically significant improvements over state-of-the-art algorithms.
- The method effectively matches learners and training sets to dataset features.
- Provides theoretical loss bounds based on Rademacher complexity.

## Abstract

We present a new approach to ensemble learning. Our approach constructs a tree of subsets of the feature space and associates a predictor (predictive model) - determined by training one of a given family of base learners on an endogenously determined training set - to each node of the tree; we call the resulting object a tree of predictors. The (locally) optimal tree of predictors is derived recursively; each step involves jointly optimizing the split of the terminal nodes of the previous tree and the choice of learner and training set (hence predictor) for each set in the split. The feature vector of a new instance determines a unique path through the optimal tree of predictors; the final prediction aggregates the predictions of the predictors along this path. We derive loss bounds for the final predictor in terms of the Rademacher complexity of the base learners. We report the results of a number of experiments on a variety of datasets, showing that our approach provides statistically significant improvements over state-of-the-art machine learning algorithms, including various ensemble learning methods. Our approach works because it allows us to endogenously create more complex learners - when needed - and endogenously match both the learner and the training set to the characteristics of the dataset while still avoiding over-fitting.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.01396/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1706.01396/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1706.01396/full.md

---
Source: https://tomesphere.com/paper/1706.01396