Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses
Eloi Tanguy

TL;DR
This paper provides the first theoretical convergence guarantees for stochastic gradient descent when training neural networks with Sliced Wasserstein losses, bridging a gap between practical observations and mathematical understanding.
Contribution
It establishes convergence results for fixed-step SGD trajectories in training neural networks with Sliced Wasserstein losses, under realistic assumptions.
Findings
SGD trajectories approach sub-gradient flow equations as step size decreases
Under stricter conditions, long-term limits approach generalized critical points
Provides theoretical guarantees for SGD convergence in Sliced Wasserstein neural network training
Abstract
Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Advanced Neuroimaging Techniques and Applications
MethodsStochastic Gradient Descent
