Convergence Analysis for Training Stochastic Neural Networks via Stochastic Gradient Descent
Richard Archibald, Feng Bao, Yanzhao Cao, Hui Sun

TL;DR
This paper proves convergence of a novel sample-wise back-propagation method for stochastic neural networks modeled as SDE discretizations, using stochastic optimal control theory, with validation through numerical experiments.
Contribution
It introduces a new convergence analysis for a sample-wise back-propagation method in SNNs, linking training steps to network depth and employing stochastic control techniques.
Findings
Training steps proportional to square of layers in convex case
Validation of convergence through numerical experiments
Performance demonstrated on benchmark machine learning tasks
Abstract
In this paper, we carry out numerical analysis to prove convergence of a novel sample-wise back-propagation method for training a class of stochastic neural networks (SNNs). The structure of the SNN is formulated as discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the back-propagation for training the SNN. The convergence analysis is derived with and without convexity assumption for optimization of the SNN parameters. Especially, our analysis indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. Numerical experiments are carried out to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks
