Spectrally Deconfounded Random Forests
Markus Ulmer, Cyrill Scheidegger, and Peter B\"uhlmann

TL;DR
This paper proposes Spectrally Deconfounded Random Forests (SDForests), a novel method designed to improve function estimation in high-dimensional settings with unobserved confounding variables, outperforming classical Random Forests in various scenarios.
Contribution
The paper introduces SDForests, a new spectral deconfounding technique integrated into Random Forests to reduce bias from unobserved confounders in high-dimensional data.
Findings
SDForests outperform classical Random Forests in confounded settings.
SDForests have comparable performance to classical RFs when no confounding is present.
Empirical results demonstrate reduced bias and improved estimation accuracy.
Abstract
We introduce a modification of Random Forests to estimate functions when unobserved confounding variables are present. The technique is tailored for high-dimensional settings with many observed covariates. We use spectral deconfounding techniques to minimize a deconfounded version of the least squares objective, resulting in the Spectrally Deconfounded Random Forests (SDForests). We show how the omitted variable bias gets small given some assumptions. We compare the performance of SDForests to classical Random Forests in a simulation study and a semi-synthetic setting using single-cell gene expression data. Empirical results suggest that SDForests outperform classical Random Forests in estimating the direct regression function, even if the theoretical assumptions, requiring linear and dense confounding, are not perfectly met, and that SDForests have comparable performance in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Bayesian Methods and Mixture Models · Advanced Clustering Algorithms Research
