PC Adjusted Testing for Low Dimensional Parameters
Sohom Bhattacharya, Rounak Dey, and Rajarshi Mukherjee

TL;DR
This paper analyzes the effectiveness of principal component adjustments in high-dimensional linear regression, revealing conditions under which they fail to control Type I error in genetic studies.
Contribution
It provides a theoretical framework for understanding when PC adjustments are valid or lead to error inflation in high-dimensional settings.
Findings
PC regression often fails to control Type I error
Necessary and sufficient conditions for error inflation are identified
Numerical experiments support theoretical results
Abstract
In this paper, we investigate the impact of high-dimensional Principal Component (PC) adjustments on inferring the effects of variables on outcomes, with a focus on applications in genetic association studies where PC adjustment is commonly used to account for population stratification. We consider high-dimensional linear regression in the regime where the number of covariates grows proportionally to the number of samples. In this setting, we provide an asymptotically precise understanding of when PC adjustments yield valid tests with controlled Type I error rates. Our results demonstrate that, under both fixed and diverging signal strengths, PC regression often fails to control the Type I error at the desired nominal level. Furthermore, we establish necessary and sufficient conditions for Type I error inflation based on covariate distributions. These theoretical findings are further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genetic and phenotypic traits in livestock · Statistical Methods and Inference
