AD for an Array Language with Nested Parallelism

Robert Schenck; Ola R{\o}nning; Troels Henriksen; Cosmin E. Oancea

arXiv:2202.10297·cs.PL·February 22, 2022·1 cites

AD for an Array Language with Nested Parallelism

Robert Schenck, Ola R{\o}nning, Troels Henriksen, Cosmin E. Oancea

PDF

Open Access

TL;DR

This paper introduces a novel AD technique for a non-recursive array language with nested parallelism, optimized for GPU execution, eliminating the need for a tape by re-executing code in new scopes.

Contribution

It presents a new AD method that leverages redundant execution and compiler transformations to efficiently differentiate nested parallel array programs on GPUs.

Findings

01

Competitive performance on nine benchmarks

02

Effective differentiation of loops and parallel operators

03

Elimination of tape in reverse-mode AD

Abstract

We present a technique for applying (forward and) reverse-mode automatic differentiation (AD) on a non-recursive second-order functional array language that supports nested parallelism and is primarily aimed at efficient GPU execution. The key idea is to eliminate the need for a "tape" by relying on redundant execution to bring into each new scope all program variables that may be needed by the differentiated code. Efficient execution is enabled by the observation that perfectly-nested scopes do not introduce re-execution, and such perfect nests are produced by known compiler transformations, e.g., flattening. Our technique differentiates loops and bulk-parallel operators, such as map, reduce, histogram, scan, scatter, by specific rewrite rules, and aggressively optimizes the resulting nested-parallel code. We report an experimental evaluation that compares with established AD solutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Network Packet Processing and Optimization · Parallel Computing and Optimization Techniques