Neural Taylor Approximations: Convergence and Exploration in Rectifier   Networks

David Balduzzi; Brian McWilliams; Tony Butler-Yeoman

arXiv:1611.02345·cs.LG·June 7, 2018·5 cites

Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

David Balduzzi, Brian McWilliams, Tony Butler-Yeoman

PDF

Open Access

TL;DR

This paper introduces a convergence guarantee for modern convolutional neural networks using neural Taylor approximations, revealing how adaptive optimizers explore activation spaces to improve training outcomes.

Contribution

It provides the first convergence analysis for non-smooth, non-convex convnets using Taylor approximations, and investigates how optimizer exploration affects training.

Findings

01

Convergence guarantees match lower bounds for nonsmooth functions.

02

Neural Taylor approximation effectively captures neural optimization dynamics.

03

Adaptive optimizers explore activation configurations more thoroughly, leading to better solutions.

Abstract

Modern convolutional networks, incorporating rectifiers and max-pooling, are neither smooth nor convex; standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets, which furthermore matches a lower bound for convex nonsmooth functions. The key technical tool is the neural Taylor approximation -- a straightforward application of Taylor expansions to neural networks -- and the associated Taylor loss. Experiments on a range of optimizers, layers, and tasks provide evidence that the analysis accurately captures the dynamics of neural optimization. The second half of the paper applies the Taylor approximation to isolate the main difficulty in training rectifier nets -- that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices

MethodsAdam