Gradients of Functions of Large Matrices

Nicholas Kr\"amer; Pablo Moreno-Mu\~noz; Hrittik Roy; S{\o}ren Hauberg

arXiv:2405.17277·cs.LG·October 28, 2024

Gradients of Functions of Large Matrices

Nicholas Kr\"amer, Pablo Moreno-Mu\~noz, Hrittik Roy, S{\o}ren Hauberg

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an efficient method for differentiating functions of large matrices using adjoint systems for Lanczos and Arnoldi iterations, enabling improved gradient computations in scientific machine learning models.

Contribution

It derives and implements the first adjoint systems for Lanczos and Arnoldi iterations, facilitating efficient differentiation without problem-specific optimizations.

Findings

01

Code competes with Diffrax for PDE differentiation

02

Outperforms standard factorization in Bayesian neural network calibration

03

Enables efficient gradient computation for large matrix functions

Abstract

Tuning scientific and probabilistic machine learning models $-$ for example, partial differential equations, Gaussian processes, or Bayesian neural networks $-$ often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the present work is the first to explain how to differentiate these workhorses of numerical linear algebra efficiently. To get there, we derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax when it comes to differentiating PDEs, GPyTorch for selecting Gaussian process models and beats standard factorisation methods for calibrating Bayesian neural networks. All this is achieved without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pnkraemer/experiments-lanczos-adjoints
jaxOfficial

Videos

Gradients of Functions of Large Matrices· slideslive

Taxonomy

TopicsMatrix Theory and Algorithms

MethodsSparse Evolutionary Training · Lib · Gaussian Process