Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions
Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas

TL;DR
This paper introduces variance-reduced Halpern iteration methods for finite-sum monotone inclusion problems, achieving improved complexity guarantees and near-optimal performance in solving broad classes of equilibrium problems in machine learning.
Contribution
It presents the first variance reduction techniques for finite-sum monotone inclusions, enhancing convergence guarantees and computational efficiency over existing methods.
Findings
Achieves $ ilde{O}(n + rac{ oot{n}L}{ ext{epsilon}})$ oracle complexity.
Provides guarantees for last iterate and operator norm residual.
Shows near-optimality of the complexity bounds in the Lipschitz setting.
Abstract
Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter . The…
Peer Reviews
Decision·ICLR 2024 poster
1. The paper proposed two algorithms in the cocoercive case and the monotone Lipschitz case, respectively. 2. The new algorithms improve oracle complexity by a factor of $\sqrt{n}$ compared with existing methods on some conditions. 3. Numerical experiments are presented to further show the improvement of the new algorithms. 4. The paper is clearly written and easy to follow.
1. Some concepts. E.g., monotonicity, maximal monotone, are not explicitly defined in the paper, which slightly impairs the completeness of the paper. 2. Both Algorithms 1 and 3 are variants of existing algorithms. The Algorithm 1 is a simpler version of Cai et al. (2022a), while there is not enough comparison to present the novelty and advantage of the new algorithm. The Algorithm 2 is a combination of inexact Halpern iteration and VR-FoRB (Alacaoglu & Malitsky (2022)), which still doesn’t pres
In the paper's context, the authors claim an oracle complexity of \\( \mathcal{O}(n + \sqrt{n}L/\varepsilon) \\) under their studied conditions, providing a considerable (theoretical) improvement over prior methodologies.
*Due to confusing presentation, there is doubt on whether the improvements stem from an innovative analysis approach, or is predominantly artifacts of the specific assumptions employed in their (restricted) settings.* **Assumptions and Implications:** - Considering the decomposition, $\mathbb{E}_{q \sim Q} \left[ \| F_q(u) - F_q(v) \|^2 \right] = \text{Var}_{q \sim Q} \left[ \| F_q(u) - F_q(v) \| \right] + \left( \mathbb{E}_{q \sim Q} \left[ \| F_q(u) - F_q(v) \| \right] \right)^2$, assumption
The paper is well-written and explains every detail of the algorithms and their contributions. The discussions and comparisons to previous works clearly reflect their differences and improvements. Although the algorithms and techniques are based on previous works, the obtained last-iterate guarantees on the operator norm for finite-sum monotone inclusion problems are new in the related literature.
1. PAGE was originally designed for nonconvex minimization problems and SVRG/SAGA is a common choice for convex problems. Although the problem to be solved is monotone, Algorithm 1 chooses PAGE as the base algorithm. Could the authors explain why? What happens if SVRG is used? 2. I don't see any dependence and requirement on $L_F$ for both Algorithms 1 and 2. Is the assumption that $F$ is $L_F$-Lipschitz used anywhere in the analysis? Why is it required other than allowing easier comparisons wi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
