Joints in Random Forests

Alvaro H. C. Correia; Robert Peharz; Cassio de Campos

arXiv:2006.14937·cs.LG·November 20, 2020·1 cites

Joints in Random Forests

Alvaro H. C. Correia, Robert Peharz, Cassio de Campos

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel interpretation of decision trees and random forests as generative models, enabling them to handle missing data and detect outliers more effectively than traditional methods.

Contribution

It presents a new generative perspective on DTs and RFs, leading to the development of GeDTs and GeFs that can process missing features and identify outliers.

Findings

01

GeDTs and GeFs outperform KNN imputation in handling missing data

02

Models can detect outliers by monitoring marginal probabilities

03

Theoretical extension of consistency to missing at random scenarios

Abstract

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AlCorreia/GeFs
noneOfficial

Videos

Joints in Random Forests· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis