Decision Tree Embedding by Leaf-Means
Cencheng Shen, Yuexiao Dong, Carey E. Priebe

TL;DR
Decision Tree Embedding (DTE) transforms decision tree partitions into an interpretable feature space, improving classification accuracy and efficiency while maintaining interpretability, and offers a novel integration of decision trees and neural network concepts.
Contribution
The paper introduces DTE, a new method leveraging leaf partitions for embedding, with theoretical analysis and an ensemble extension paired with linear discriminant analysis.
Findings
DTE outperforms or matches random forests and shallow neural networks in accuracy.
DTE requires significantly less training time than ensemble methods.
Theoretical properties include preservation of conditional density and classification error bounds.
Abstract
Decision trees and random forest remain highly competitive for classification on medium-sized, standard datasets due to their robustness, minimal preprocessing requirements, and interpretability. However, a single tree suffers from high estimation variance, while large ensembles reduce this variance at the cost of substantial computational overhead and diminished interpretability. In this paper, we propose Decision Tree Embedding (DTE), a fast and effective method that leverages the leaf partitions of a trained classification tree to construct an interpretable feature representation. By using the sample means within each leaf region as anchor points, DTE maps inputs into an embedding space defined by the tree's partition structure, effectively circumventing the high variance inherent in decision-tree splitting rules. We further introduce an ensemble extension based on additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques · Bayesian Modeling and Causal Inference
