Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing   its Gradient Estimator Bias

Axel Laborieux; Maxence Ernoult; Benjamin Scellier; Yoshua Bengio,; Julie Grollier; Damien Querlioz

arXiv:2101.05536·cs.LG·January 15, 2021

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias

Axel Laborieux, Maxence Ernoult, Benjamin Scellier, Yoshua Bengio,, Julie Grollier, Damien Querlioz

PDF

TL;DR

This paper identifies and reduces the bias in Equilibrium Propagation's gradient estimator, enabling it to scale to deep convolutional networks and making it a viable, energy-efficient alternative to backpropagation for complex visual tasks.

Contribution

The authors demonstrate that reducing the bias in EP's gradient estimate allows training deep ConvNets, extending EP's applicability beyond simple datasets like MNIST.

Findings

01

Bias in EP's gradient estimate hinders deep network training.

02

Cancelling the bias enables training of deep ConvNets with EP.

03

EP becomes a scalable method for error gradient computation in deep neural networks.

Abstract

Equilibrium Propagation (EP) is a biologically-inspired counterpart of Backpropagation Through Time (BPTT) which, owing to its strong theoretical guarantees and the locality in space of its learning rule, fosters the design of energy-efficient hardware dedicated to learning. In practice, however, EP does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP, including architectures with distinct forward and backward connections. These results highlight EP as a scalable approach to compute error gradients in deep neural networks, thereby motivating its hardware implementation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.