Training Neural Networks Without Gradients: A Scalable ADMM Approach

Gavin Taylor; Ryan Burmeister; Zheng Xu; Bharat Singh; Ankit Patel,; Tom Goldstein

arXiv:1605.02026·cs.LG·May 9, 2016·146 cites

Training Neural Networks Without Gradients: A Scalable ADMM Approach

Gavin Taylor, Ryan Burmeister, Zheng Xu, Bharat Singh, Ankit Patel,, Tom Goldstein

PDF

Open Access 2 Repos

TL;DR

This paper introduces a scalable alternative to gradient-based training for neural networks using ADMM and Bregman iteration, enabling efficient training without gradients and achieving near-linear speedups on large distributed systems.

Contribution

It proposes a novel training approach that replaces gradient descent with an ADMM-based method, improving scalability and convergence on large neural networks.

Findings

01

Achieves linear speedups with thousands of cores.

02

Avoids issues like saturation and saddle points in training.

03

Provides a globally solvable sub-step framework.

Abstract

With the growing importance of large network models and enormous training datasets, GPUs have become increasingly necessary to train neural networks. This is largely because conventional optimization algorithms rely on stochastic gradient methods that don't scale well to large numbers of cores in a cluster setting. Furthermore, the convergence of all gradient methods, including batch methods, suffers from common problems like saturation effects, poor conditioning, and saddle points. This paper explores an unconventional training method that uses alternating direction methods and Bregman iteration to train networks without gradient descent steps. The proposed method reduces the network training problem to a sequence of minimization sub-steps that can each be solved globally in closed form. The proposed method is advantageous because it avoids many of the caveats that make gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications