The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows
Valerie Hayot-Sasson, Tristan Glatard, Ariel Rokem

TL;DR
Prefetching data in cloud-based neuroimaging workflows significantly reduces data transfer overheads, leading to up to 1.86x speed-ups, as demonstrated by a new Python library implementation for AWS S3.
Contribution
We developed 'Rolling Prefetch', a Python library extending S3Fs to enable prefetching from AWS S3, improving data loading efficiency in large-scale neuroimaging workflows.
Findings
Speed-up of up to 1.86x in data processing tasks
Prefetching reduces data transfer overheads in cloud workflows
Implementation is applicable to various scientific data domains
Abstract
To support the growing demands of neuroscience applications, researchers are transitioning to cloud computing for its scalable, robust and elastic infrastructure. Nevertheless, large datasets residing in object stores may result in significant data transfer overheads during workflow execution. Prefetching, a method to mitigate the cost of reading in mixed workloads, masks data transfer costs within processing time of prior tasks. We present an implementation of "Rolling Prefetch", a Python library that implements a particular form of prefetching from AWS S3 object store, and we quantify its benefits. Rolling Prefetch extends S3Fs, a Python library exposing AWS S3 functionality via a file object, to add prefetch capabilities. In measured analysis performance of a 500 GB brain connectivity dataset stored on S3, we found that prefetching provides significant speed-ups of up to 1.86x,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Scientific Computing and Data Management · Cell Image Analysis Techniques
