Provable Boolean Interaction Recovery from Tree Ensemble obtained via   Random Forests

Merle Behr; Yu Wang; Xiao Li; and Bin Yu

arXiv:2102.11800·math.ST·July 6, 2022

Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests

Merle Behr, Yu Wang, Xiao Li, and Bin Yu

PDF

TL;DR

This paper provides a theoretical foundation for how Random Forests can reliably discover Boolean feature interactions, introducing the LSS model and proving the consistency of the LSSFind method for interaction recovery.

Contribution

It introduces the LSS model to capture biological thresholding behavior and proves that the LSSFind algorithm consistently recovers Boolean interactions from RF ensembles under this model.

Findings

01

DWP(S) bounds characterize Boolean interactions in RF

02

LSSFind recovers interactions consistently as sample size grows

03

Simulation confirms robustness even with assumption violations

Abstract

Random Forests (RF) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative Random Forests (iRF) use a tree ensemble from iteratively modified RF to obtain predictive and stable non-linear or Boolean interactions of features. They have shown great promise for Boolean biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover Boolean feature interactions are missing. Inspired by the thresholding behavior in many biological processes, we first introduce a novel discontinuous nonlinear regression model, called the Locally Spiky Sparse (LSS) model. Specifically, the LSS model assumes that the regression function is a linear combination of piecewise constant Boolean interaction terms. Given an RF tree ensemble,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.