Towards Scalable Backpropagation-Free Gradient Estimation

Daniel Wang; Evan Markou; Dylan Campbell

arXiv:2511.03110·cs.LG·November 6, 2025

Towards Scalable Backpropagation-Free Gradient Estimation

Daniel Wang, Evan Markou, Dylan Campbell

PDF

Open Access

TL;DR

This paper introduces a novel gradient estimation method that reduces bias and variance, enabling scalable backpropagation-free training of neural networks, especially as network width increases.

Contribution

The paper presents a new gradient estimation approach that manipulates upstream Jacobian matrices to improve scalability and accuracy over existing methods.

Findings

01

Reduces bias and variance in gradient estimates.

02

Performs better as network width increases.

03

Potential to scale to larger neural networks.

Abstract

While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations. Existing gradient estimation methods that instead use forward-mode automatic differentiation struggle to scale beyond small networks due to the high variance of the estimates. Efforts to mitigate this have so far introduced significant bias to the estimates, reducing their utility. We introduce a gradient estimation approach that reduces both bias and variance by manipulating upstream Jacobian matrices when computing guess directions. It shows promising results and has the potential to scale to larger networks, indeed performing better as the network width is increased. Our understanding of this method is facilitated by analyses of bias and variance, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Sparse and Compressive Sensing Techniques