Reverse-Mode AD of Reduce-by-Index and Scan in Futhark
Lotte Maria Bruun, Ulrik Stuhr Larsen, Nikolaj Hinnerskov, Cosmin, Oancea

TL;DR
This paper introduces reverse-mode automatic differentiation for core parallel programming constructs in Futhark, optimizing performance through specialized algorithms and analyzing the effects of differentiating at different abstraction levels on GPU execution.
Contribution
It provides new algorithms for reverse-mode AD of reduce, scan, and reduce by index in Futhark, with practical specializations for efficient differentiation in GPU contexts.
Findings
Specialized algorithms improve differentiation efficiency on GPUs.
Differentiating at high level vs. low level has distinct performance impacts.
Experimental results highlight strengths and weaknesses of the proposed methods.
Abstract
We present and evaluate the Futhark implementation of reverse-mode automatic differentiation (AD) for the basic blocks of parallel programming: reduce, prefix sum (scan), and reduce by index. We first present derivations of general-case algorithms and then discuss several specializations that result in efficient differentiation of most cases of practical interest. We report an experiment that evaluates the performance of the differentiated code in the context of GPU execution and highlights the impact of the proposed specializations as well as the strengths and weaknesses of differentiating at high level vs. low level (i.e., ``differentiating the memory'').
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems
