TL;DR
This paper introduces partitioned thermodynamic sampling algorithms for neural network training, which improve convergence speed, accuracy, and robustness compared to traditional optimization methods like SGD and ADAM.
Contribution
It proposes novel hybrid partitioned stochastic differential equation-based algorithms tailored for neural networks, demonstrating their advantages over standard optimization techniques.
Findings
Partitioned thermodynamic methods converge faster.
These methods are more accurate than SGD and ADAM.
Thermodynamic approaches are more robust in complex landscapes.
Abstract
Traditionally, neural networks are parameterized using optimization procedures such as stochastic gradient descent, RMSProp and ADAM. These procedures tend to drive the parameters of the network toward a local minimum. In this article, we employ alternative "sampling" algorithms (referred to here as "thermodynamic parameterization methods") which rely on discretized stochastic differential equations for a defined target distribution on parameter space. We show that the thermodynamic perspective already improves neural network training. Moreover, by partitioning the parameters based on natural layer structure we obtain schemes with very rapid convergence for data sets with complicated loss landscapes. We describe easy-to-implement hybrid partitioned numerical algorithms, based on discretized stochastic differential equations, which are adapted to feed-forward neural networks, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRMSProp
