Towards Scalable Backpropagation-Free Gradient Estimation
Daniel Wang, Evan Markou, Dylan Campbell

TL;DR
This paper introduces a novel gradient estimation method that reduces bias and variance, enabling scalable backpropagation-free training of neural networks, especially as network width increases.
Contribution
The paper presents a new gradient estimation approach that manipulates upstream Jacobian matrices to improve scalability and accuracy over existing methods.
Findings
Reduces bias and variance in gradient estimates.
Performs better as network width increases.
Potential to scale to larger neural networks.
Abstract
While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations. Existing gradient estimation methods that instead use forward-mode automatic differentiation struggle to scale beyond small networks due to the high variance of the estimates. Efforts to mitigate this have so far introduced significant bias to the estimates, reducing their utility. We introduce a gradient estimation approach that reduces both bias and variance by manipulating upstream Jacobian matrices when computing guess directions. It shows promising results and has the potential to scale to larger networks, indeed performing better as the network width is increased. Our understanding of this method is facilitated by analyses of bias and variance, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Sparse and Compressive Sensing Techniques
