Testing for Regression Heteroskedasticity with High-Dimensional Random Forests
Chi Chien-Ming

TL;DR
This paper introduces two novel tests for detecting heteroskedasticity in high-dimensional regression models, utilizing random forests and flexible knockoff variables, with rigorous theoretical validation and practical applications.
Contribution
It proposes new high-dimensional heteroskedasticity tests based on machine learning and flexible knockoffs, filling a gap in statistical inference for complex models.
Findings
Tests have rigorous P-values and controlled test sizes.
Simulation studies demonstrate strong empirical performance.
Application to HIV data illustrates practical utility.
Abstract
Statistical inference for high-dimensional regression heteroskedasticity is an important but under-explored problem. The current paper aims at filling this gap by proposing two tests, namely the variance difference test and the variance difference Breusch-Pagan test, for assessing high-dimensional regression heteroskedasticity. The former tests whether an explanatory feature of interest is associated with the conditional variance of a response variable, while the latter tests heteroskedasticity in the regression, which is known to be the Breusch-Pagan test problem. To formally establish the tests, we have derived rigorous P-values and test sizes, and analyzed the test power under a nonparametric heteroskedastic data generating model with high-dimensional input features. Such a model setting takes into account high-dimensional applications with flexible structures of heteroskedasticity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Advanced Statistical Methods and Models
