Decision trees compensate for model misspecification
Hugh Panton, Gavin Leech, Laurence Aitchison

TL;DR
This paper investigates how decision trees and gradient boosting machines perform well even when the true data interactions are absent, highlighting their robustness to model misspecification and proposing methods for robust generalized linear models.
Contribution
It confirms hypotheses about the role of tree depth in performance without true interactions and introduces two methods for robust generalized linear models.
Findings
Decision trees are robust to model misspecification.
Tree depth influences performance beyond true interactions.
Proposed methods improve robustness of generalized linear models.
Abstract
The best-performing models in ML are not interpretable. If we can explain why they outperform, we may be able to replicate these mechanisms and obtain both interpretability and performance. One example are decision trees and their descendent gradient boosting machines (GBMs). These perform well in the presence of complex interactions, with tree depth governing the order of interactions. However, interactions cannot fully account for the depth of trees found in practice. We confirm 5 alternative hypotheses about the role of tree depth in performance in the absence of true interactions, and present results from experiments on a battery of datasets. Part of the success of tree models is due to their robustness to various forms of mis-specification. We present two methods for robust generalized linear models (GLMs) addressing the composite and mixed response scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
