Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks

Luxu Liang; Ariel Neufeld; Ying Zhang

arXiv:2409.17107·math.OC·May 27, 2025

Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks

Luxu Liang, Ariel Neufeld, Ying Zhang

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the convergence of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradients, providing explicit bounds applicable to training neural networks with ReLU activations.

Contribution

It offers the first non-asymptotic convergence analysis of SGHMC with discontinuous gradients, applicable to non-convex optimization and neural network training.

Findings

01

Explicit upper bounds on expected excess risk are derived.

02

The analysis applies to neural networks with ReLU activation.

03

Numerical experiments demonstrate practical relevance.

Abstract

In this paper, we provide a non-asymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm to a target measure in Wasserstein-1 and Wasserstein-2 distance. Crucially, compared to the existing literature on SGHMC, we allow its stochastic gradient to be discontinuous. This allows us to provide explicit upper bounds, which can be controlled to be arbitrarily small, for the expected excess risk of non-convex stochastic optimization problems with discontinuous stochastic gradients, including, among others, the training of neural networks with ReLU activation function. To illustrate the applicability of our main results, we consider numerical experiments on quantile estimation and on several optimization problems involving ReLU neural networks relevant in finance and artificial intelligence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LuxLiang/SGHMC_discontinuous
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia?