Work-stealing prefix scan: Addressing load imbalance in large-scale image registration
Marcin Copik, Tobias Grosser, Torsten Hoefler, Paolo Bientinesi,, Benjamin Berkels

TL;DR
This paper introduces a novel parallel prefix scan algorithm with work-stealing to efficiently address load imbalance in large-scale image registration, significantly reducing processing time from hours to minutes.
Contribution
It presents a new work-stealing hierarchical prefix scan algorithm tailored for imbalanced image registration workloads, enabling scalable parallel processing.
Findings
Achieved over 200x speedup in image registration time.
Successfully scaled to 1024 cores for large image series.
Enabled nanoscale material property analysis in minutes.
Abstract
Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this paper, we study the recursive registration of a series of electron microscopy images - a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
