Algorithms for Large-scale Whole Genome Association Analysis

Elmar Peise (1); Diego Fabregat (1); Yurii Aulchenko (2); Paolo; Bientinesi (1) ((1) AICES; RWTH Aachen; (2) Institute of Cytology and; Genetics; Novosibirsk)

arXiv:1304.2272·cs.CE·May 2, 2013

Algorithms for Large-scale Whole Genome Association Analysis

Elmar Peise (1), Diego Fabregat (1), Yurii Aulchenko (2), Paolo, Bientinesi (1) ((1) AICES, RWTH Aachen, (2) Institute of Cytology and, Genetics, Novosibirsk)

PDF

TL;DR

This paper introduces scalable algorithms for large-scale genome-wide association studies, efficiently handling massive genotype datasets and covariance matrices across distributed systems.

Contribution

It presents novel streaming and distributed memory techniques to process enormous genetic datasets that exceed main memory capacity.

Findings

01

Enables analysis of datasets with millions of polymorphisms

02

Maintains high performance with distributed memory and streaming

03

Supports genome-wide association studies on large populations

Abstract

In order to associate complex traits with genetic polymorphisms, genome-wide association studies process huge datasets involving tens of thousands of individuals genotyped for millions of polymorphisms. When handling these datasets, which exceed the main memory of contemporary computers, one faces two distinct challenges: 1) Millions of polymorphisms come at the cost of hundreds of Gigabytes of genotype data, which can only be kept in secondary storage; 2) the relatedness of the test population is represented by a covariance matrix, which, for large populations, can only fit in the combined main memory of a distributed architecture. In this paper, we present solutions for both challenges: The genotype data is streamed from and to secondary storage using a double buffering technique, while the covariance matrix is kept across the main memory of a distributed memory system. We show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.