Accelerating key bioinformatics tasks 100-fold by improving memory   access

Igor Sfiligoi; Daniel McDonald; Rob Knight

arXiv:2104.09565·cs.DC·July 21, 2021

Accelerating key bioinformatics tasks 100-fold by improving memory access

Igor Sfiligoi, Daniel McDonald, Rob Knight

PDF

TL;DR

This paper demonstrates how optimizing memory access patterns can accelerate key bioinformatics computations, achieving over 100-fold speedups in common analysis functions.

Contribution

The authors introduce memory access improvements to two scikit-bio functions, significantly enhancing their performance beyond previous implementations.

Findings

01

Over 100x speedup in principal coordinates analysis

02

Significant performance gains in Mantel test

03

Memory optimization enables efficient large-scale bioinformatics analysis

Abstract

Most experimental sciences now rely on computing, and biological sciences are no exception. As datasets get bigger, so do the computing costs, making proper optimization of the codes used by scientists increasingly important. Many of the codes developed in recent years are based on the Python-based NumPy, due to its ease of use and good performance characteristics. The composable nature of NumPy, however, does not generally play well with the multi-tier nature of modern CPUs, making any non-trivial multi-step algorithm limited by the external memory access speeds, which are hundreds of times slower than the CPU's compute capabilities. In order to fully utilize the CPU compute capabilities, one must keep the working memory footprint small enough to fit in the CPU caches, which requires splitting the problem into smaller portions and fusing together as many steps as possible. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.