You Only Linearize Once: Tangents Transpose to Gradients
Alexey Radul, Adam Paszke, Roy Frostig, Matthew Johnson, Dougal, Maclaurin

TL;DR
This paper presents a unified view of automatic differentiation by decomposing reverse-mode AD into forward-mode, unzipping, and transposing linear functions, simplifying implementation and understanding.
Contribution
It formalizes a linear type system to prove functions are linear, showing forward-mode produces such functions and enabling reverse-mode through unzipping and transposing.
Findings
Forward-mode AD produces linear functions.
Unzipping and transposing linear functions is cost-effective.
Checkpointing naturally arises from unzipping in the formalization.
Abstract
Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the linear and non-linear parts and then (iii) transposition of the linear part. To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Logic, programming, and type systems · Advanced Control Systems Optimization
