Streaming Data from HDD to GPUs for Sustained Peak Performance
Lucas Beyer (1), Paolo Bientinesi (1), ((1) AICES, RWTH Aachen)

TL;DR
This paper introduces a streaming algorithm that efficiently transfers data from HDD to GPUs, significantly reducing execution time and enabling scalable, high-performance GWAS computations with multiple GPUs.
Contribution
The paper presents a novel streaming and pipelined algorithm that overcomes data management bottlenecks and achieves near-perfect scalability across multiple GPUs for GWAS tasks.
Findings
2.6x speedup over optimized CPU implementation
Almost perfect scalability with multiple GPUs
488x faster than existing biology library
Abstract
In the context of the genome-wide association studies (GWAS), one has to solve long sequences of generalized least-squares problems; such a task has two limiting factors: execution time --often in the range of days or weeks-- and data management --data sets in the order of Terabytes. We present an algorithm that obviates both issues. By pipelining the computation, and thanks to a sophisticated transfer strategy, we stream data from hard disk to main memory to GPUs and achieve sustained peak performance; with respect to a highly-optimized CPU implementation, our algorithm shows a speedup of 2.6x. Moreover, the approach lends itself to multiple GPUs and attains almost perfect scalability. When using 4 GPUs, we observe speedups of 9x over the aforementioned implementation, and 488x over a widespread biology library.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
