Parallel Optimisation of Bootstrapping in R
T. M. Sloan, M. Piotrowski, T. Forster, P. Ghazal

TL;DR
This paper presents a parallel implementation of bootstrapping in R using the SPRINT package, significantly accelerating computations on multi-core and supercomputing systems for statistical inference tasks.
Contribution
It introduces a parallelization method for bootstrapping in R that achieves near-optimal speedup on multi-node systems, improving computational efficiency.
Findings
Achieves close to optimal speedup on 16 nodes of a supercomputer.
Reaches near 100-fold speedup on 512 nodes.
Outperforms existing native R parallelization options.
Abstract
Bootstrapping is a popular and computationally demanding resampling method used for measuring the accuracy of sample estimates and assisting with statistical inference. R is a freely available language and environment for statistical computing popular with biostatisticians for genomic data analyses. A survey of such R users highlighted its implementation of bootstrapping as a prime candidate for parallelization to overcome computational bottlenecks. The Simple Parallel R Interface (SPRINT) is a package that allows R users to exploit high performance computing in multi-core desktops and supercomputers without expert knowledge of such systems. This paper describes the parallelization of bootstrapping for inclusion in the SPRINT R package. Depending on the complexity of the bootstrap statistic and the number of resamples, this implementation has close to optimal speed up on up to 16 nodes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Data Mining Algorithms and Applications · Bayesian Modeling and Causal Inference
