Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function
Dong-Young Lim, Ariel Neufeld, Sotirios Sabanis, Ying Zhang

TL;DR
This paper provides non-asymptotic error bounds for the TUSLA algorithm in non-convex stochastic optimization with applications to neural networks with ReLU activation, demonstrating its effectiveness where other algorithms may fail.
Contribution
The paper introduces non-asymptotic analysis and error bounds for TUSLA in complex non-convex settings with super-linear and discontinuous gradients, including neural network applications.
Findings
TUSLA converges rapidly where other optimizers fail.
Theoretical error bounds are established in Wasserstein distances.
Numerical experiments support the effectiveness of TUSLA in neural network training.
Abstract
We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2020). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which support our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Statistical Methods and Inference · Stochastic Gradient Optimization Techniques
MethodsAMSGrad · RMSProp · Stochastic Gradient Descent
