Validating GWAS Findings through Reverse Engineering of Contingency Tables
Yuzhou Jiang, Erman Ayday

TL;DR
This paper introduces a novel method for validating GWAS results by estimating contingency tables from p-values and comparing minor allele frequencies, enabling reproducibility checks without sharing sensitive data.
Contribution
The paper presents a new approach that detects unintentional errors in GWAS findings using p-values and MAF comparisons, without requiring access to original datasets.
Findings
Effectively detects unintentional errors in GWAS data
Identifies small errors, such as 1% SNP misreporting
Validates results using real SNP datasets from OpenSNP
Abstract
Reproducibility in genome-wide association studies (GWAS) is crucial for ensuring reliable genomic research outcomes. However, limited access to original genomic datasets (mainly due to privacy concerns) prevents researchers from reproducing experiments to validate results. In this paper, we propose a novel method for GWAS reproducibility validation that detects unintentional errors without the need for dataset sharing. Our approach leverages p-values from GWAS outcome reports to estimate contingency tables for each single nucleotide polymorphism (SNP) and calculates the Hamming distance between the minor allele frequencies (MAFs) derived from these contingency tables and publicly available phenotype-specific MAF data. By comparing the average Hamming distance, we validate results that fall within a trusted threshold as reliable, while flagging those that exceed the threshold for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Geochemistry and Geologic Mapping · Data Management and Algorithms
