Step-size Optimization for Continual Learning

Thomas Degris; Khurram Javed; Arsalan Sharifnassab; Yuxin Liu; Richard; Sutton

arXiv:2401.17401·cs.LG·February 1, 2024·1 cites

Step-size Optimization for Continual Learning

Thomas Degris, Khurram Javed, Arsalan Sharifnassab, Yuxin Liu, Richard, Sutton

PDF

Open Access

TL;DR

This paper investigates step-size adaptation in continual learning, highlighting limitations of heuristic methods like Adam, and demonstrates the benefits of meta-gradient approaches such as IDBD for optimizing learning rates.

Contribution

It reveals the shortcomings of heuristic step-size adaptation and advocates for meta-gradient methods, proposing a combined approach for improved continual learning performance.

Findings

01

IDBD improves step-size vectors on simple problems

02

Heuristic methods like Adam can move away from optimal step-sizes

03

Combining heuristic and meta-gradient approaches is promising

Abstract

In continual learning, a learner has to keep learning from the data over its whole life time. A key issue is to decide what knowledge to keep and what knowledge to let go. In a neural network, this can be implemented by using a step-size vector to scale how much gradient samples change network weights. Common algorithms, like RMSProp and Adam, use heuristics, specifically normalization, to adapt this step-size vector. In this paper, we show that those heuristics ignore the effect of their adaptation on the overall objective function, for example by moving the step-size vector away from better step-size vectors. On the other hand, stochastic meta-gradient descent algorithms, like IDBD (Sutton, 1992), explicitly optimize the step-size vector with respect to the overall objective function. On simple problems, we show that IDBD is able to consistently improve step-size vectors, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · IoT-based Smart Home Systems · Indoor and Outdoor Localization Technologies

MethodsRMSProp · Adam