SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data
Jocelyn T. Chi, Ilse C. F. Ipsen, Tzu-Hung Hsiao, Ching-Heng Lin,, Li-San Wang, Wan-Ping Lee, Tzu-Pin Lu, Jung-Ying Tzeng

TL;DR
SEAGLE is a scalable, exact algorithm enabling large-scale set-based gene-environment interaction tests in biobank data, overcoming computational challenges while maintaining accuracy for continuous traits.
Contribution
It introduces SEAGLE, a novel matrix computation-based method that performs exact GxE variance component tests efficiently on biobank-scale datasets without approximations.
Findings
SEAGLE accurately replicates traditional GxE test results in simulations.
It handles sample sizes of around 100,000 on standard laptops.
Applied to Taiwan Biobank data, it identified gene-environment interactions affecting BMI.
Abstract
The explosion of biobank data offers immediate opportunities for gene-environment (GxE) interaction studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in GxE assessment, especially for set-based GxE variance component (VC) tests, which are a widely used strategy to boost overall GxE signals and to evaluate the joint GxE effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a Scalable Exact AlGorithm for Large-scale set-based GxE tests, to permit GxE VC tests for biobank-scale data. SEAGLE employs modern matrix computations to achieve the same "exact" results as the original GxE VC tests without imposing additional assumptions or relying on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Genetic Associations and Epidemiology · Gene expression and cancer classification
