Empirical investigations on WVA structural issues
Alexey Kutalev, Alisa Lapina

TL;DR
This paper empirically investigates the WVA method's effectiveness and limitations in addressing catastrophic forgetting in neural networks, focusing on hyper-parameter tuning and application to gradients.
Contribution
It provides empirical insights into the WVA method's performance, limitations, and optimal configurations for mitigating catastrophic forgetting.
Findings
WVA's effectiveness varies with hyper-parameter choices
Application to gradients presents specific challenges
Optimal attenuation functions depend on task sequence
Abstract
In this paper we want to present the results of empirical verification of some issues concerning the methods for overcoming catastrophic forgetting in neural networks. First, in the introduction, we will try to describe in detail the problem of catastrophic forgetting and methods for overcoming it for those who are not yet familiar with this topic. Then we will discuss the essence and limitations of the WVA method which we presented in previous papers. Further, we will touch upon the issues of applying the WVA method to gradients or optimization steps of weights, choosing the optimal attenuation function in this method, as well as choosing the optimal hyper-parameters of the method depending on the number of tasks in sequential training of neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fuzzy Logic and Control Systems
MethodsElastic Weight Consolidation
