A Tracy-Widom Empirical Estimator For Valid P-values With High-Dimensional Datasets
Maxime Turgeon, Celia MT Greenwood, Aurelie Labbe

TL;DR
This paper introduces a Tracy-Widom based empirical estimator for calculating valid p-values in high-dimensional multivariate analysis, enabling reliable inference in complex datasets like genomics and brain imaging.
Contribution
It proposes a novel empirical estimator for the largest root distribution in high-dimensional multivariate tests using a Tracy-Widom family approximation from limited permutations.
Findings
Estimator provides valid p-values in high-dimensional settings.
Simulation results confirm accuracy of the Tracy-Widom approximation.
Applied method successfully to DNA methylation and disease association data.
Abstract
Recent technological advances in many domains including both genomics and brain imaging have led to an abundance of high-dimensional and correlated data being routinely collected. Classical multivariate approaches like Multivariate Analysis of Variance (MANOVA) and Canonical Correlation Analysis (CCA) can be used to study relationships between such multivariate datasets. Yet, special care is required with high-dimensional data, as the test statistics may be ill-defined and classical inference procedures break down. In this work, we explain how valid p-values can be derived for these multivariate methods even in high dimensional datasets. Our main contribution is an empirical estimator for the largest root distribution of a singular double Wishart problem; this general framework underlies many common multivariate analysis approaches. From a small number of permutations of the data, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Genetic Associations and Epidemiology · Statistical Methods and Inference
