An Optimal Control Approach to Deep Learning and Applications to   Discrete-Weight Neural Networks

Qianxiao Li; Shuji Hao

arXiv:1803.01299·cs.LG·June 5, 2018·39 cites

An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks

Qianxiao Li, Shuji Hao

PDF

Open Access 1 Repo

TL;DR

This paper formulates deep learning as a discrete optimal control problem, introducing a gradient-free training algorithm based on Pontryagin's maximum principle, and applies it to train sparse, discrete-weight neural networks.

Contribution

It develops a novel discrete-time optimal control framework for neural network training using the method of successive approximations, avoiding gradient reliance and enabling discrete weight constraints.

Findings

01

Achieves competitive performance with discrete weights.

02

Produces very sparse ternary networks.

03

Provides error estimates and stability analysis for the training algorithm.

Abstract

Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin's maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LiQianxiao/discrete-MSA
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Model Reduction and Neural Networks