Statistical significance in high-dimensional linear models

Peter B\"uhlmann

arXiv:1202.1377·stat.ME·October 14, 2013

Statistical significance in high-dimensional linear models

Peter B\"uhlmann

PDF

TL;DR

This paper introduces a new method for calculating p-values in high-dimensional linear models, accounting for multiple testing and dependence among hypotheses, with proven error control and demonstrated effectiveness.

Contribution

It presents a novel p-value construction technique using Ridge estimation and correction for projection bias, applicable to both local and global hypotheses in high dimensions.

Findings

01

Strong error control without assumptions on true coefficients

02

Effective adjustment for multiple testing with dependent p-values

03

Validated through simulations and real data analysis

Abstract

We propose a method for constructing p-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the p-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our p-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients while regarding the latter, our procedure might not be optimal in terms of power. We demonstrate the method in simulated examples and a real data application.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.