The Backpropagation algorithm for a math student

Saeed Damadi; Golnaz Moharrer; Mostafa Cham

arXiv:2301.09977·cs.LG·June 2, 2023

The Backpropagation algorithm for a math student

Saeed Damadi, Golnaz Moharrer, Mostafa Cham

PDF

Open Access

TL;DR

This paper explains how the backpropagation algorithm efficiently computes gradients in deep neural networks by expressing derivatives as matrix products of Jacobians, making the process accessible to a broad audience.

Contribution

It presents a mathematical formulation of backpropagation using Jacobian matrices, clarifying the gradient computation process in DNNs for diverse disciplines.

Findings

01

Gradient expressed as Jacobian matrix product

02

Backpropagation complexity independent of layers

03

Mathematical justification for chain rule application

Abstract

A Deep Neural Network (DNN) is a composite function of vector-valued functions, and in order to train a DNN, it is necessary to calculate the gradient of the loss function with respect to all parameters. This calculation can be a non-trivial task because the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters. The Backpropagation (BP) algorithm leverages the composite structure of the DNN to efficiently compute the gradient. As a result, the number of layers in the network does not significantly impact the complexity of the calculation. The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator. This can be achieved by considering the total derivative of each layer with respect to its parameters and expressing it as a Jacobian matrix. The gradient can then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications