On Quadratic Penalties in Elastic Weight Consolidation
Ferenc Husz\'ar

TL;DR
This paper critically examines the quadratic penalties used in Elastic Weight Consolidation (EWC), revealing potential inconsistencies and issues like double-counting in multi-task scenarios, and extends the theoretical derivation to more than two tasks.
Contribution
It provides an extended derivation of EWC for multiple tasks and highlights potential flaws in the quadratic penalty formulation.
Findings
Quadratic penalties in EWC may cause double-counting of data.
The derivation is extended to scenarios with more than two tasks.
Potential inconsistencies in EWC's theoretical foundation are identified.
Abstract
Elastic weight consolidation (EWC, Kirkpatrick et al, 2017) is a novel algorithm designed to safeguard against catastrophic forgetting in neural networks. EWC can be seen as an approximation to Laplace propagation (Eskin et al, 2004), and this view is consistent with the motivation given by Kirkpatrick et al (2017). In this note, I present an extended derivation that covers the case when there are more than two tasks. I show that the quadratic penalties in EWC are inconsistent with this derivation and might lead to double-counting data from earlier tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsElastic Weight Consolidation
