Improving information retention in large scale online continual learning

Zhipeng Cai; Vladlen Koltun; Ozan Sener

arXiv:2210.06401·cs.CV·October 13, 2022·1 cites

Improving information retention in large scale online continual learning

Zhipeng Cai, Vladlen Koltun, Ozan Sener

PDF

Open Access

TL;DR

This paper investigates the challenge of information retention in large-scale online continual learning, revealing limitations of naive SGD and proposing an adaptive moving average optimizer with a new learning rate schedule to improve performance.

Contribution

It introduces an adaptive moving average optimizer and a novel learning rate schedule specifically designed to enhance information retention in large-scale online continual learning.

Findings

01

AMA+MALR improves retention on benchmarks

02

Naive SGD fails to retain information long-term

03

Proposed methods outperform existing approaches

Abstract

Given a stream of data sampled from non-stationary distributions, online continual learning (OCL) aims to adapt efficiently to new data while retaining existing knowledge. The typical approach to address information retention (the ability to retain previous knowledge) is keeping a replay buffer of a fixed size and computing gradients using a mixture of new data and the replay buffer. Surprisingly, the recent work (Cai et al., 2021) suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited, i.e., the gradients are computed using all past data. This paper focuses on this peculiarity to understand and address information retention. To pinpoint the source of this problem, we theoretically show that, given limited computation budgets at each time step, even without strict storage limit, naively applying SGD with constant or constantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsStochastic Gradient Descent