Statistical significance in high-dimensional linear models
Peter B\"uhlmann

TL;DR
This paper introduces a new method for calculating p-values in high-dimensional linear models, accounting for multiple testing and dependence among hypotheses, with proven error control and demonstrated effectiveness.
Contribution
It presents a novel p-value construction technique using Ridge estimation and correction for projection bias, applicable to both local and global hypotheses in high dimensions.
Findings
Strong error control without assumptions on true coefficients
Effective adjustment for multiple testing with dependent p-values
Validated through simulations and real data analysis
Abstract
We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the p-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our p-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients while regarding the latter, our procedure might not be optimal in terms of power. We demonstrate the method in simulated examples and a real data application.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
