Confusion of Tagged Perturbations in Forward Automatic Differentiation of Higher-Order Functions
Oleksandr Manzyuk, Barak A. Pearlmutter, Alexey Andreyevich, Radul, David R. Rush, Jeffrey Mark Siskind

TL;DR
This paper identifies a subtle bug in forward automatic differentiation involving confusion of perturbation tags in higher-order functions, and discusses potential solutions with their limitations.
Contribution
It uncovers a bug in forward AD's handling of perturbation tags in higher-order functions and evaluates two solutions with their respective drawbacks.
Findings
The bug causes perturbation tags to be reused improperly in higher-order derivatives.
Two solutions are proposed: eta expansion and tag renaming, each with significant challenges.
The bug affects the correctness of derivatives in practical AD implementations.
Abstract
Forward Automatic Differentiation (AD) is a technique for augmenting programs to compute derivatives. The essence of Forward AD is to attach perturbations to each number, and propagate these through the computation. When derivatives are nested, the distinct derivative calculations, and their associated perturbations, must be distinguished. This is typically accomplished by creating a unique tag for each derivative calculation, tagging the perturbations, and overloading the arithmetic operators. We exhibit a subtle bug, present in fielded implementations, in which perturbations are confused despite the tagging machinery. The essence of the bug is this: each invocation of a derivative creates a unique tag but a unique tag is needed for each derivative calculation. When taking derivatives of higher-order functions, these need not correspond! The derivative of a higher-order function …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
