Decoupled Neural Interfaces using Synthetic Gradients
Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol, Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu

TL;DR
This paper introduces decoupled neural interfaces that use synthetic gradients to enable asynchronous and independent training of network modules, improving flexibility and scalability in neural network training.
Contribution
The work presents a novel framework for decoupling neural network modules using synthetic gradients, allowing asynchronous updates and extending applicability to recurrent and hierarchical models.
Findings
Effective training of feed-forward networks with asynchronous layers.
Extended RNN modeling through future gradient prediction.
Enabled decoupled forward and backward passes for modular networks.
Abstract
Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
