Two Efficient Message-passing Exclusive Scan Algorithms
Jesper Larsson Tr\"aff

TL;DR
This paper introduces two efficient message-passing algorithms for exclusive scan operations in parallel systems, optimizing communication rounds and operator applications, especially for small input vectors.
Contribution
The paper presents two novel algorithms for exclusive prefix sums that improve efficiency by balancing communication rounds and operator applications in message-passing systems.
Findings
The first algorithm trades communication rounds for fewer operator applications.
The second algorithm modifies an all-reduce approach with complexity depending on p-1's bit pattern.
Both algorithms are optimal for small input vectors where communication overhead dominates.
Abstract
Parallel scan primitives compute element-wise inclusive or exclusive prefix sums of input vectors contributed by consecutively ranked processors under an associative, possibly expensive, binary operator . In message-passing systems with bounded, one-ported communication capabilities, at least or send-receive communication rounds are required to perform the scans. While there are well-known, simple algorithms for the inclusive scan that solve the problem in send-receive communication rounds with applications of the operator, the exclusive scan is different and has been much less addressed. By considering natural invariants for the exclusive prefix sums problem, we present two different algorithms that are efficient in the number of communication rounds and in the number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
