Decoupled Neural Interfaces using Synthetic Gradients

Max Jaderberg; Wojciech Marian Czarnecki; Simon Osindero; Oriol; Vinyals; Alex Graves; David Silver; Koray Kavukcuoglu

arXiv:1608.05343·cs.LG·July 4, 2017·76 cites

Decoupled Neural Interfaces using Synthetic Gradients

Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol, Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu

PDF

Open Access 5 Repos

TL;DR

This paper introduces decoupled neural interfaces that use synthetic gradients to enable asynchronous and independent training of network modules, improving flexibility and scalability in neural network training.

Contribution

The work presents a novel framework for decoupling neural network modules using synthetic gradients, allowing asynchronous updates and extending applicability to recurrent and hierarchical models.

Findings

01

Effective training of feed-forward networks with asynchronous layers.

02

Extended RNN modeling through future gradient prediction.

03

Enabled decoupled forward and backward passes for modular networks.

Abstract

Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory