Hard ASH: Sparsity and the right optimizer make a continual learner
Santtu Keskinen

TL;DR
This paper demonstrates that combining a sparse activation function with an adaptive optimizer enables neural networks to better retain knowledge in continual learning, challenging traditional regularization methods.
Contribution
The introduction of Hard ASH, a novel sparse activation function, and its combination with adaptive optimizers improves continual learning performance.
Findings
Hard ASH enhances learning retention in incremental tasks.
Sparse activation functions can compete with regularization techniques.
Adaptive optimizers complement sparse activations for continual learning.
Abstract
In class incremental learning, neural networks typically suffer from catastrophic forgetting. We show that an MLP featuring a sparse activation function and an adaptive learning rate optimizer can compete with established regularization techniques in the Split-MNIST task. We highlight the effectiveness of the Adaptive SwisH (ASH) activation function in this context and introduce a novel variant, Hard Adaptive SwisH (Hard ASH) to further enhance the learning retention.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · EEG and Brain-Computer Interfaces · Ferroelectric and Negative Capacitance Devices
MethodsSigmoid Activation
