Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu, Jeremy Bernstein, Markus Meister, Yisong Yue

TL;DR
This paper introduces Nero, a neural optimizer that adapts to architecture, requiring minimal tuning, and demonstrates reliable training where traditional methods often fail, with a focus on geometric insights linking architecture and optimization.
Contribution
The paper proposes Nero, a novel optimizer that integrates neural architecture awareness, enabling more reliable training without extensive tuning and reducing memory footprint.
Findings
Nero trains reliably without momentum or weight decay.
Nero outperforms Adam and SGD in certain scenarios.
Nero has a smaller memory footprint than Adam or LAMB.
Abstract
Descent methods for deep networks are notoriously capricious: they require careful tuning of step size, momentum and weight decay, and which method will work best on a new benchmark is a priori unclear. To address this problem, this paper conducts a combined study of neural architecture and optimisation, leading to a new optimiser called Nero: the neuronal rotator. Nero trains reliably without momentum or weight decay, works in situations where Adam and SGD fail, and requires little to no learning rate tuning. Also, Nero's memory footprint is ~ square root that of Adam or LAMB. Nero combines two ideas: (1) projected gradient descent over the space of balanced networks; (2) neuron-specific updates, where the step size sets the angle through which each neuron's hyperplane turns. The paper concludes by discussing how this geometric connection between architecture and optimisation may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
MethodsAdam · Stochastic Gradient Descent · LAMB
