GWAS Summary Statistic Tool: A Meta-Analysis and Parsing Tool for Polygenic Risk Score Calculation
Muhammad Muneeb, David B. Ascher

TL;DR
GWASPoker is a Python tool that efficiently identifies GWAS summary statistic files suitable for polygenic risk score calculation by partial downloading and header analysis, saving time and storage.
Contribution
It introduces a phenotype-driven, catalog-specific pre-download triage method that automates GWAS file parsing without full downloads, improving workflow efficiency.
Findings
Successfully parsed 89.6% of files across 20 formats
Achieved 82.1% header validation accuracy against full downloads
Automatically retrieved 98.8% of curated GWAS files
Abstract
Motivation: GWAS (genome-wide association study) summary statistic files are essential inputs for polygenic risk score (PRS) calculation. However, identifying suitable files across thousands of catalog entries typically requires downloading large datasets and manually inspecting their column structures, a process that is both time-consuming and storage-intensive. Results: We present GWASPoker, a phenotype-driven, GWAS-Catalog-specific pre-download triage tool that scans candidate GWAS files for PRS column availability through partial downloads and header detection, without requiring full-file transfer. Analysing 60,499 records from the GWAS Catalog, 60,281 (99.6%) contained accessible download links, of which 54,026 (89.6%) were successfully partially downloaded and parsed across 20 file formats, yielding 724 unique header signatures. Across 13 phenotypes, 84 of 85 manually curated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Bioinformatics and Genomic Networks · Genomics and Rare Diseases
