Stochastic Difference-of-Convex Optimization with Momentum

El Mahdi Chayti; Martin Jaggi

arXiv:2510.17503·cs.LG·October 21, 2025

Stochastic Difference-of-Convex Optimization with Momentum

El Mahdi Chayti, Martin Jaggi

PDF

Open Access

TL;DR

This paper introduces a momentum-based stochastic difference-of-convex optimization method that guarantees convergence under standard assumptions for any batch size, addressing limitations of existing approaches.

Contribution

The work demonstrates that momentum enables convergence in stochastic DC optimization with small batches, a significant improvement over prior methods requiring large batches or strong noise assumptions.

Findings

01

Momentum ensures convergence under standard assumptions for any batch size.

02

Without momentum, convergence can fail regardless of stepsize.

03

Empirical results show strong performance of the proposed method.

Abstract

Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques