AD for an Array Language with Nested Parallelism
Robert Schenck, Ola R{\o}nning, Troels Henriksen, Cosmin E. Oancea

TL;DR
This paper introduces a novel AD technique for a non-recursive array language with nested parallelism, optimized for GPU execution, eliminating the need for a tape by re-executing code in new scopes.
Contribution
It presents a new AD method that leverages redundant execution and compiler transformations to efficiently differentiate nested parallel array programs on GPUs.
Findings
Competitive performance on nine benchmarks
Effective differentiation of loops and parallel operators
Elimination of tape in reverse-mode AD
Abstract
We present a technique for applying (forward and) reverse-mode automatic differentiation (AD) on a non-recursive second-order functional array language that supports nested parallelism and is primarily aimed at efficient GPU execution. The key idea is to eliminate the need for a "tape" by relying on redundant execution to bring into each new scope all program variables that may be needed by the differentiated code. Efficient execution is enabled by the observation that perfectly-nested scopes do not introduce re-execution, and such perfect nests are produced by known compiler transformations, e.g., flattening. Our technique differentiates loops and bulk-parallel operators, such as map, reduce, histogram, scan, scatter, by specific rewrite rules, and aggressively optimizes the resulting nested-parallel code. We report an experimental evaluation that compares with established AD solutions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Network Packet Processing and Optimization · Parallel Computing and Optimization Techniques
