
TL;DR
This paper introduces a novel proof for adjoint linear systems based on Algorithmic Differentiation, extending to higher-order systems and providing a new perspective on adjoint operations in linear algebra.
Contribution
It presents a new proof method for adjoint linear systems using Algorithmic Differentiation principles, applicable to various matrix operations and higher-order systems.
Findings
New proof for adjoint systems based on Algorithmic Differentiation
Extension to higher-order adjoint linear systems
Alternative proof for matrix-matrix and vector products
Abstract
A new proof for adjoint systems of linear equations is presented. The argument is built on the principles of Algorithmic Differentiation. Application to scalar multiplication sets the base line. Generalization yields adjoint inner vector, matrix-vector, and matrix-matrix products leading to an alternative proof for first- as well as higher-order adjoint linear systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCoding theory and cryptography · Matrix Theory and Algorithms · Polynomial and algebraic computation
A Note on Adjoint Linear Algebra
Uwe Naumann
Department of Computer Science, RWTH Aachen University, 52056 Aachen, Germany,
[email protected]
Abstract
A new proof for adjoint systems of linear equations is presented. The argument is built on the principles of Algorithmic Differentiation. Application to scalar multiplication sets the base line. Generalization yields adjoint inner vector, matrix-vector, and matrix-matrix products leading to an alternative proof for first- as well as higher-order adjoint linear systems.
keywords:
algorithmic differentiation, adjoint, linear algebra
{AMS}
15A06, 15A29, 26B05
1 Motivation
Algorithmic Differentiation [3, 5] of numerical programs builds on a set of elemental functions with known partial derivatives with respect to their arguments at the given point of evaluation. The propagation of adjoint derivatives relies on the associativity of the chain rule of differential calculus. Differentiable combinations of elemental functions yield higher-level elementals. Efficient implementation of AD requires the highest possible level of elemental functions.
Basic AD assumes the set of elemental functions to be formed by the arithmetic operators and intrinsic functions built into the given programming language. While its application to linear algebra methods turns out to be straight forward basic AD is certainly not the method of choice from the point of view of computational efficiency. Elementals of the highest possible level should be used. Their derivatives should be formulated as functions of high-level elementals in order to exploit benefits of corresponding optimized implementations.
Following this rationale this note presents a new way to derive adjoint systems of linear equations based on adjoint Basic Linear Algebra Subprograms (BLAS) [4]. It is well known (see [2] and references therein) that for systems of linear equations with invertible and primal solution first-order adjoints of (both with denoting the real numbers) and of (both ) can be evaluated at the primal solution as
[TABLE]
The main contribution of this note is an alternative proof for Eqn. (1) that builds naturally on the adjoint BLAS used in the context of state of the art AD. For consistency with related work we follow the notation in [5], that is, denotes the value of the first-order directional derivative (or tangent) associated with a variable and denotes the value of its adjoint.
2 Prerequisites
The Jacobian of a differentiable implementation of as a computer program induces a linear mapping implementing the tangent of The corresponding adjoint operator is formally defined via the inner vector product identity
[TABLE]
yielding [1]. In the following all (program) variables are assumed to be alias- and context-free, that is, distinct variables do not overlap in memory and is assumed to be not embedded in an enclosing computation. We distinguish between active and passive variables. Derivatives of all active outputs of the given program are computed with respect to all active inputs. We are not interested in derivatives of passive outputs nor are we computing derivatives with respect to passive inputs.
3 BLAS Revisited
In its basic form AD builds on known tangents and adjoints of the arithmetic functions and operators built into programming languages. Tangents and adjoints are propagated along the flow of data according to the chain rule of differential calculus. We enumerate entries of vectors staring from zero as
From the perspective of AD adjoint versions of higher-level BLAS are derived as adjoints of lower-level BLAS. Optimization of the result aims for implementation using the highest possible level of BLAS. For example, adjoint matrix-matrix multiplication (level-3 BLAS) is derived from adjoint matrix-vector multiplication (level-2 BLAS) yielding efficient evaluation as two matrix-matrix products (level-3 BLAS) as shown in Lemma 3.7. Rigorous derivation of this result requires bottom-up investigation of the BLAS hierarchy. We start with basic scalar multiplication (Lemma 3.1) followed by the inner vector (Lemma 3.3) and matrix-vector (Lemma 3.5) products as prerequisites for the matrix-matrix product.
Lemma 3.1**.**
The adjoint of scalar multiplication with active is computed as
[TABLE]
*for yielding *
Proof 3.2**.**
Differentiation of with respect to and yields the tangent
[TABLE]
for Eqn. (2) implies
[TABLE]
yielding
[TABLE]
*and hence Eqn. (3). *
Lemma 3.3**.**
The adjoint of an inner vector product
[TABLE]
with active inputs and yielding the active output is computed as
[TABLE]
*for yielding and *
Proof 3.4**.**
Differentiation of for and , with respect to and yields the tangent
[TABLE]
Eqn. (2) implies
[TABLE]
*yielding and hence Eqn. (4). *
The following derivation of adjoint matrix-vector and matrix-matrix products relies on serialization of matrices. Individual rows of a matrix are denoted as for columns are denoted as for (Row) Vectors in are denoted as (column) vectors in are denoted as Consequently, a row-major serialization of is given by A column-major serialization of is given by Tangents and adjoints of the individual entries of define
[TABLE]
and
[TABLE]
respectively.
Lemma 3.5**.**
The adjoint of a matrix-vector product
[TABLE]
with active inputs and yielding the active output is computed as
[TABLE]
*for yielding and *
Proof 3.6**.**
Differentiation of where and with respect to and yields the tangent
[TABLE]
Eqn. (2) implies
[TABLE]
*where denotes a concatenation of copies of as a column vector. Eqn. (5) follows immediately. *
Lemma 3.7**.**
The adjoint of a matrix-matrix product with active inputs yielding the active output is computed as
[TABLE]
*for yielding and *
Proof 3.8**.**
Differentiation of where , and with respect to and yields tangents
[TABLE]
for and hence
[TABLE]
Eqn. (2) implies
[TABLE]
*for and hence the Eqn. (6). *
4 Systems of Linear Equations Revisited
Lemmas 4.1 and 4.3 form the basis for the new proof of Eqn. (1).
Lemma 4.1**.**
The tangent
[TABLE]
of for active and passive implies the adjoint
[TABLE]
Proof 4.2**.**
[TABLE]
follows from application of Lemma 3.7 to with passive
[TABLE]
*follows from application of Lemma 3.7 to with passive Substitution of and yields Lemma 4.1. *
Lemma 4.3**.**
The tangent
[TABLE]
of with active and with passive implies the adjoint
[TABLE]
*for *
Proof 4.4**.**
From
[TABLE]
follows with Lemma 4.1
[TABLE]
*for Moreover, implies due to identity Jacobians of with respect to for and hence Lemma 4.3. *
Theorem 4.5**.**
*Adjoints of systems of linear equations with invertible and right-hand side are evaluated at the primal solution by Eqn. (1). *
Proof 4.6**.**
Differentiation of with respect to and yields the tangent system
[TABLE]
which implies
[TABLE]
with identity Lemma 4.3 yields
[TABLE]
*and hence Eqn. (1). *
5 Conclusion
As observed previously by various authors a possibly available factorization of can be reused both for the tangent () and the adjoint () systems. The additional worst case computational cost of can thus be reduced to Higher-order tangents [adjoints] of linear systems amount to repeated solutions of linear systems with the same [transposed] system matrix combined with tangent [adjoint] BLAS.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Dunford and J. Schwartz , Linear Operators. I. General Theory , With the assistance of W. G. Bade and R. G. Bartle. Pure and Applied Mathematics, Vol. 7, Interscience Publishers, Inc., New York, 1958.
- 2[2] M. B. Giles , Collected matrix derivative results for forward and reverse mode algorithmic differentiation , in Advances in Automatic Differentiation, C. Bischof, M. Bücker, P. Hovland, U. Naumann, and J. Utke, eds., Springer, 2008, pp. 35–44.
- 3[3] A. Griewank and A. Walther , Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation, Seocnd Edition , no. OT 105 in Other Titles in Applied Mathematics, SIAM, 2008.
- 4[4] C. Lawson, R. Hanson, D. Kincaid, and F. Krogh , Basic linear algebra subprograms for Fortran usage , ACM Trans. Math. Softw., 5 (1979), pp. 308–323.
- 5[5] U. Naumann , The Art of Differentiating Computer Programs. An Introduction to Algorithmic Differentiation. , no. SE 24 in Software, Environments, and Tools, SIAM, 2012.
