Convergence of SGD for Training Neural Networks with Sliced Wasserstein   Losses

Eloi Tanguy

arXiv:2307.11714·cs.LG·March 19, 2024·2 cites

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Eloi Tanguy

PDF

Open Access

TL;DR

This paper provides the first theoretical convergence guarantees for stochastic gradient descent when training neural networks with Sliced Wasserstein losses, bridging a gap between practical observations and mathematical understanding.

Contribution

It establishes convergence results for fixed-step SGD trajectories in training neural networks with Sliced Wasserstein losses, under realistic assumptions.

Findings

01

SGD trajectories approach sub-gradient flow equations as step size decreases

02

Under stricter conditions, long-term limits approach generalized critical points

03

Provides theoretical guarantees for SGD convergence in Sliced Wasserstein neural network training

Abstract

Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Advanced Neuroimaging Techniques and Applications

MethodsStochastic Gradient Descent