Natural Gradient Descent for Online Continual Learning

Joe Khawand; David Colliaux

arXiv:2603.20898·cs.LG·March 24, 2026

Natural Gradient Descent for Online Continual Learning

Joe Khawand, David Colliaux

PDF

Open Access

TL;DR

This paper introduces a Natural Gradient Descent optimizer with Fisher Information Matrix approximation to improve online continual learning, significantly reducing catastrophic forgetting and enhancing convergence on image classification tasks.

Contribution

It proposes a novel optimization approach using Natural Gradient Descent with KFAC approximation, improving performance in online continual learning scenarios.

Findings

01

Enhanced accuracy across multiple datasets

02

Significant reduction in catastrophic forgetting

03

Improved convergence speed in OCL models

Abstract

Online Continual Learning (OCL) for image classification represents a challenging subset of Continual Learning, focusing on classifying images from a stream without assuming data independence and identical distribution (i.i.d). The primary challenge in this context is to prevent catastrophic forgetting, where the model's performance on previous tasks deteriorates as it learns new ones. Although various strategies have been proposed to address this issue, achieving rapid convergence remains a significant challenge in the online setting. In this work, we introduce a novel approach to training OCL models that utilizes the Natural Gradient Descent optimizer, incorporating an approximation of the Fisher Information Matrix (FIM) through Kronecker Factored Approximate Curvature (KFAC). This method demonstrates substantial improvements in performance across all OCL methods, particularly when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Data Stream Mining Techniques · Machine Learning and ELM