Step-size Optimization for Continual Learning
Thomas Degris, Khurram Javed, Arsalan Sharifnassab, Yuxin Liu, Richard, Sutton

TL;DR
This paper investigates step-size adaptation in continual learning, highlighting limitations of heuristic methods like Adam, and demonstrates the benefits of meta-gradient approaches such as IDBD for optimizing learning rates.
Contribution
It reveals the shortcomings of heuristic step-size adaptation and advocates for meta-gradient methods, proposing a combined approach for improved continual learning performance.
Findings
IDBD improves step-size vectors on simple problems
Heuristic methods like Adam can move away from optimal step-sizes
Combining heuristic and meta-gradient approaches is promising
Abstract
In continual learning, a learner has to keep learning from the data over its whole life time. A key issue is to decide what knowledge to keep and what knowledge to let go. In a neural network, this can be implemented by using a step-size vector to scale how much gradient samples change network weights. Common algorithms, like RMSProp and Adam, use heuristics, specifically normalization, to adapt this step-size vector. In this paper, we show that those heuristics ignore the effect of their adaptation on the overall objective function, for example by moving the step-size vector away from better step-size vectors. On the other hand, stochastic meta-gradient descent algorithms, like IDBD (Sutton, 1992), explicitly optimize the step-size vector with respect to the overall objective function. On simple problems, we show that IDBD is able to consistently improve step-size vectors, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · IoT-based Smart Home Systems · Indoor and Outdoor Localization Technologies
MethodsRMSProp · Adam
