Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions

Xufeng Cai; Ahmet Alacaoglu; Jelena Diakonikolas

arXiv:2310.02987·cs.LG·October 27, 2023·1 cites

Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions

Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas

PDF

Open Access 3 Reviews

TL;DR

This paper introduces variance-reduced Halpern iteration methods for finite-sum monotone inclusion problems, achieving improved complexity guarantees and near-optimal performance in solving broad classes of equilibrium problems in machine learning.

Contribution

It presents the first variance reduction techniques for finite-sum monotone inclusions, enhancing convergence guarantees and computational efficiency over existing methods.

Findings

01

Achieves $ ilde{O}(n + rac{ oot{n}L}{ ext{epsilon}})$ oracle complexity.

02

Provides guarantees for last iterate and operator norm residual.

03

Shows near-optimality of the complexity bounds in the Lipschitz setting.

Abstract

Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter $L$ . The…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The paper proposed two algorithms in the cocoercive case and the monotone Lipschitz case, respectively. 2. The new algorithms improve oracle complexity by a factor of $\sqrt{n}$ compared with existing methods on some conditions. 3. Numerical experiments are presented to further show the improvement of the new algorithms. 4. The paper is clearly written and easy to follow.

Weaknesses

1. Some concepts. E.g., monotonicity, maximal monotone, are not explicitly defined in the paper, which slightly impairs the completeness of the paper. 2. Both Algorithms 1 and 3 are variants of existing algorithms. The Algorithm 1 is a simpler version of Cai et al. (2022a), while there is not enough comparison to present the novelty and advantage of the new algorithm. The Algorithm 2 is a combination of inexact Halpern iteration and VR-FoRB (Alacaoglu & Malitsky (2022)), which still doesn’t pres

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

In the paper's context, the authors claim an oracle complexity of \$ \mathcal{O}(n + \sqrt{n}L/\varepsilon) \$ under their studied conditions, providing a considerable (theoretical) improvement over prior methodologies.

Weaknesses

*Due to confusing presentation, there is doubt on whether the improvements stem from an innovative analysis approach, or is predominantly artifacts of the specific assumptions employed in their (restricted) settings.* **Assumptions and Implications:** - Considering the decomposition, $\mathbb{E}_{q \sim Q} \left[ \| F_q(u) - F_q(v) \|^2 \right] = \text{Var}_{q \sim Q} \left[ \| F_q(u) - F_q(v) \| \right] + \left( \mathbb{E}_{q \sim Q} \left[ \| F_q(u) - F_q(v) \| \right] \right)^2$, assumption

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The paper is well-written and explains every detail of the algorithms and their contributions. The discussions and comparisons to previous works clearly reflect their differences and improvements. Although the algorithms and techniques are based on previous works, the obtained last-iterate guarantees on the operator norm for finite-sum monotone inclusion problems are new in the related literature.

Weaknesses

1. PAGE was originally designed for nonconvex minimization problems and SVRG/SAGA is a common choice for convex problems. Although the problem to be solved is monotone, Algorithm 1 chooses PAGE as the base algorithm. Could the authors explain why? What happens if SVRG is used? 2. I don't see any dependence and requirement on $L_F$ for both Algorithms 1 and 2. Is the assumption that $F$ is $L_F$-Lipschitz used anywhere in the analysis? Why is it required other than allowing easier comparisons wi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning