# Continual learning with hypernetworks

**Authors:** Johannes von Oswald, Christian Henning, Benjamin F. Grewe and, Jo\~ao Sacramento

arXiv: 1906.00695 · 2022-04-12

## TL;DR

This paper introduces task-conditioned hypernetworks for continual learning, which generate model weights based on task identity, effectively mitigating catastrophic forgetting and demonstrating state-of-the-art results on benchmarks.

## Contribution

The paper proposes a novel task-conditioned hypernetwork approach that maintains previous memories efficiently and achieves superior continual learning performance.

## Key findings

- State-of-the-art results on standard CL benchmarks.
- Large capacity to retain previous memories in a compressive regime.
- Evidence of transfer learning and forward information transfer.

## Abstract

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instead of recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing task-specific weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving state-of-the-art performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display a very large capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable hypernetwork weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.00695/full.md

## Figures

43 figures with captions in the complete paper: https://tomesphere.com/paper/1906.00695/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/1906.00695/full.md

---
Source: https://tomesphere.com/paper/1906.00695