Hard ASH: Sparsity and the right optimizer make a continual learner

Santtu Keskinen

arXiv:2404.17651·cs.LG·April 30, 2024

Hard ASH: Sparsity and the right optimizer make a continual learner

Santtu Keskinen

PDF

Open Access

TL;DR

This paper demonstrates that combining a sparse activation function with an adaptive optimizer enables neural networks to better retain knowledge in continual learning, challenging traditional regularization methods.

Contribution

The introduction of Hard ASH, a novel sparse activation function, and its combination with adaptive optimizers improves continual learning performance.

Findings

01

Hard ASH enhances learning retention in incremental tasks.

02

Sparse activation functions can compete with regularization techniques.

03

Adaptive optimizers complement sparse activations for continual learning.

Abstract

In class incremental learning, neural networks typically suffer from catastrophic forgetting. We show that an MLP featuring a sparse activation function and an adaptive learning rate optimizer can compete with established regularization techniques in the Split-MNIST task. We highlight the effectiveness of the Adaptive SwisH (ASH) activation function in this context and introduce a novel variant, Hard Adaptive SwisH (Hard ASH) to further enhance the learning retention.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · EEG and Brain-Computer Interfaces · Ferroelectric and Negative Capacitance Devices

MethodsSigmoid Activation