Efficient Continual Finite-Sum Minimization
Ioannis Mavrothalassitis, Stratis Skoulakis, Leello Tadesse Dadi,, Volkan Cevher

TL;DR
This paper introduces a novel continual finite-sum minimization problem and proposes an efficient first-order stochastic variance reduction method, achieving near-optimal complexity improvements over existing algorithms.
Contribution
It formulates the continual finite-sum minimization problem and develops a new algorithm with significantly improved gradient complexity bounds.
Findings
The proposed CSVRG method achieves $ ilde{O}(n/ ext{epsilon}^{1/3} + 1/ ext{sqrt{epsilon}})$ gradient complexity.
It outperforms traditional SGD and state-of-the-art variance reduction methods like Katyusha.
The method's complexity is nearly tight, with lower bounds established for first-order methods.
Abstract
Given a sequence of functions with , finite-sum minimization seeks a point minimizing . In this work, we propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization, that asks for a sequence of points such that each minimizes the prefix-sum . Assuming that each prefix-sum is strongly convex, we develop a first-order continual stochastic variance reduction gradient method () producing an -optimal sequence with overall first-order oracles (FO). An FO corresponds to the computation of a single gradient at a given for some $j \in…
Peer Reviews
Decision·ICLR 2024 poster
A clearly written paper with sound results.
I have the following concerns about the paper: -- I can’t connect Problem 2 with its motivation. For example, you say that “it is important that a model is constantly updated so as to perform equally well both on the past and the new data”, but: 1) Problem 1 achieves precisely that; 2) in Problem 2, you train *multiple* models, with later models performing well on all data, and with older models not taking into account new data at all. To conclude, I don’t see a motivation for the problem. --
The introduction of the problem is a nice conceptual contribution. The new algorithm they proposed could also have other applications. In the notion of a "natural algorithm" is a nice contribution since it allows the analysis of algorithms and lower bounds.
A weakness is the lack of intuition about their algorithm. I mostly follow the math, however conceptually I do not know why exactly they can get an improvement in the epsilon power. It seems like the high level idea is only to compute a gradient update if we have not had an update for a long time. Establishing that the gradient is unbiased seems fairly straightforward: it just uses the fact that the gradient at the next step is a linear combination of the new function and the previous gradient
- The studied problem is well-motivated, both from the literature review on incremental learning, and from empirical estimation. - The theoretical analysis is complete, an upper bound and lower bound is provided for this problem, as well as compared to state of the art as in table 1. - The logic of this paper is easy to follow, and the assumptions/notations are presented in a clear way. - The paper has additional experiments on the ridge regression problem.
- The upper bound provided by the algorithm is not tight compared to the lower bound. - The algorithm only work in the strongly convex case. Minor Issue: - there is no input in the algorithm 2. - the value of $\alpha$ needs to be in line 2 of algorithm 1. - additional "the" in the second line of the first paragraph of section 3.1 - the VR is never defined. suggestion: "variance reduction(VR)" and then use VR afterwards
Videos
Taxonomy
TopicsDigital Filter Design and Implementation · Advanced Numerical Analysis Techniques · Numerical Methods and Algorithms
