TL;DR
This paper introduces STIMULUS, a new stochastic multi-objective optimization algorithm that achieves fast convergence and low sample complexity, with enhanced versions using adaptive batching for practical efficiency.
Contribution
The paper proposes STIMULUS and STIMULUS-M algorithms with recursive gradient estimation, achieving state-of-the-art convergence rates and sample complexities in multi-objective optimization.
Findings
Achieves $O(1/T)$ convergence in non-convex settings.
Attains $O(n+rac{ ext{sqrt}(n)}{ ext{epsilon}})$ sample complexity.
Enhanced versions with adaptive batching reduce full gradient evaluations.
Abstract
Recently, multi-objective optimization (MOO) has gained attention for its broad applications in ML, operations research, and engineering. However, MOO algorithm design remains in its infancy and many existing MOO methods suffer from unsatisfactory convergence rate and sample complexity performance. To address this challenge, in this paper, we propose an algorithm called STIMULUS( stochastic path-integrated multi-gradient recursive e\ulstimator), a new and robust approach for solving MOO problems. Different from the traditional methods, STIMULUS introduces a simple yet powerful recursive framework for updating stochastic gradient estimates to improve convergence performance with low sample complexity. In addition, we introduce an enhanced version of STIMULUS, termed STIMULUS-M, which incorporates a momentum term to further expedite convergence. We establish convergence rates of…
Peer Reviews
Decision·UAI 2025 Poster
This paper proposes STIMULUS, which can achieve lower sample complexities than existing algorithms.
There are many typos in this paper. Some proofs of this paper are unclear. 1. Eq. (23) sums both sides of Eq. (22) weighted with $\lambda_t^s$ from $s\in S$. But why $\frac{1}{2\delta} \|\nabla f_s(x_t) - u_t^s\|^2$ is not weighted with $\lambda_t^s$? 2. Why does it hold that $\|\nabla f_s(x_t) - u_t^s\|^2 = \sum_{...} \|x_{i+1} - x_i\|^2 + \| \nabla f_s(x_{(n_t−1)q}) − u^s_{(n_t−1)q}\|^2 $ in Eq.(23) 3. In the Definition 3, why should $\mathbb{E} [\sum_{s} \lambda_i^s (f_s(x_t) - f_s(x_*))
The paper introduce several stochastic algorithms for multi-objective optimization problems, which are more challenging than the single-objective problems. The paper The algorithms have better convergence rates and sample complexity than the existing results. The paper is clearly written and the main results are clearly presented.
As far as I see, the theoretical analysis seems to be problematic. For example, Theorem 1 gives convergence rates on $\frac{1}{T}\sum_{t=0}^{T-1}\|d_t\|^2$. However, the terms $d_t$ are just common descent directions built based on stochastic gradients (which is similar to the stochastic gradient in SGD). According to Definition 3 and the paragraph above, the quantity to our interest is $d=\lambda^\top\nabla F(\mathbf{x})$. note that $F(\mathbf{x})$ are the true objective functions, instead of t
1. This paper conducts a systematic study on the VR-aided multi-gradient method. Various versions of VR-based algorithms are proposed and supported by theoretical analysis, which may inspire future research in this field. 2. This paper is technical sound. The convergence analysis is comprehensive and non-trivial. 3. This paper is well-written in general and easy to follow.
1. The presentation of adaptive-batching versions is a bit ambiguous. I am not sure whether the adaptive batch is applied to the $q$-periodic full gradient or to each step. Adding more background knowledge on adaptive batch technique or a diagram for STIMULUS$^+$ would be helpful. In addition, it is unclear how to decide the batch size in experiments. 2. Besides SMGD and MOCO, CR-MOGM (Zhou et al., 2022b) should also be considered in experiments as a SOTA method.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
