High-Performance Statistical Computing in the Computing Environments of the 2020s
Seyoon Ko, Hua Zhou, Jin J. Zhou, Joong-Ho Won

TL;DR
This paper reviews recent advances in high-performance statistical computing enabled by hardware and software developments, demonstrating scalable applications like large-scale genetic analysis using HPC resources.
Contribution
It introduces new scalable data structures and optimization algorithms for high-dimensional statistical models, with practical demonstrations on large datasets.
Findings
Efficient analysis of 200,000 subjects with 500,000 genetic variants in under 45 minutes.
First demonstration of penalized regression of survival outcomes at this scale.
Scalable implementations on multi-GPU and cloud computing environments.
Abstract
Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere -- from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Health, Environment, Cognitive Aging
