Non-asymptotic estimates for TUSLA algorithm for non-convex learning   with applications to neural networks with ReLU activation function

Dong-Young Lim; Ariel Neufeld; Sotirios Sabanis; Ying Zhang

arXiv:2107.08649·math.OC·May 3, 2023·1 cites

Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

Dong-Young Lim, Ariel Neufeld, Sotirios Sabanis, Ying Zhang

PDF

Open Access 1 Repo

TL;DR

This paper provides non-asymptotic error bounds for the TUSLA algorithm in non-convex stochastic optimization with applications to neural networks with ReLU activation, demonstrating its effectiveness where other algorithms may fail.

Contribution

The paper introduces non-asymptotic analysis and error bounds for TUSLA in complex non-convex settings with super-linear and discontinuous gradients, including neural network applications.

Findings

01

TUSLA converges rapidly where other optimizers fail.

02

Theoretical error bounds are established in Wasserstein distances.

03

Numerical experiments support the effectiveness of TUSLA in neural network training.

Abstract

We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2020). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which support our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DongyoungLim/TUSLA_RELU
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Statistical Methods and Inference · Stochastic Gradient Optimization Techniques

MethodsAMSGrad · RMSProp · Stochastic Gradient Descent