Joints in Random Forests
Alvaro H. C. Correia, Robert Peharz, Cassio de Campos

TL;DR
This paper introduces a novel interpretation of decision trees and random forests as generative models, enabling them to handle missing data and detect outliers more effectively than traditional methods.
Contribution
It presents a new generative perspective on DTs and RFs, leading to the development of GeDTs and GeFs that can process missing features and identify outliers.
Findings
GeDTs and GeFs outperform KNN imputation in handling missing data
Models can detect outliers by monitoring marginal probabilities
Theoretical extension of consistency to missing at random scenarios
Abstract
Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis
