Matrix Calculus (for Machine Learning and Beyond)
Paige Bright, Alan Edelman, Steven G. Johnson

TL;DR
This paper introduces matrix calculus concepts tailored for machine learning, focusing on derivatives of matrix functions, efficient computation methods like backpropagation, and automatic differentiation techniques for large-scale optimization.
Contribution
It provides an accessible overview of matrix calculus extensions, emphasizing practical computational methods and modern automatic differentiation for machine learning applications.
Findings
Explains derivatives of matrix functions like inverses and factorizations
Highlights efficiency techniques such as reverse-mode differentiation
Introduces automatic differentiation methods for complex calculations
Abstract
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computational applications, such as large-scale optimization and machine learning, where derivatives must be re-imagined in order to be propagated through complicated calculations. The class also discusses efficiency concerns leading to "adjoint" or "reverse-mode" differentiation (a.k.a. "backpropagation"), and gives a gentle introduction to modern automatic differentiation (AD) techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
