Second-generation PLINK: rising to the challenge of larger and richer   datasets

Christopher C. Chang; Carson C. Chow; Laurent C.A.M. Tellier and; Shashaank Vattikuti; Shaun M. Purcell; James J. Lee

arXiv:1410.4803·q-bio.GN·March 3, 2015

Second-generation PLINK: rising to the challenge of larger and richer datasets

Christopher C. Chang, Carson C. Chow, Laurent C.A.M. Tellier and, Shashaank Vattikuti, Shaun M. Purcell, James J. Lee

PDF

TL;DR

The paper introduces PLINK 1.9, a significantly faster and more scalable version of the GWAS toolset, capable of handling larger, richer datasets with new data formats and algorithmic improvements.

Contribution

Development of PLINK 1.9 with extensive algorithmic enhancements and a new data format, enabling faster analysis of large, complex genetic datasets.

Findings

01

Operations accelerated by 1-4 orders of magnitude

02

Able to handle datasets exceeding RAM capacity

03

Enhanced data format supports probabilistic and multiallelic data

Abstract

PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.