A simple and flexible test of sample exchangeability with applications to statistical genomics
Alan J. Aw, Jeffrey P. Spence, Yun S. Song

TL;DR
The paper introduces the V test, a simple, fast, and flexible non-parametric method for testing sample exchangeability and feature independence in multivariate data, with applications in statistical genomics.
Contribution
It proposes a novel non-parametric V test that addresses sample exchangeability and feature independence, controlling Type I error and handling high-dimensional data effectively.
Findings
The V test performs well across various simulation scenarios.
It compares favorably against existing stratification tests based on random matrix theory.
Application to genomic data demonstrates practical utility in assessing exchangeability and LD splits.
Abstract
In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Genetic Associations and Epidemiology
