Continual learning with the neural tangent ensemble
Ari S. Benjamin, Christian Pehle, Kyle Daruwalla

TL;DR
This paper presents a novel interpretation of neural networks as Bayesian ensembles of fixed and adaptive classifiers, offering insights into continual learning and mitigating forgetting.
Contribution
It introduces the neural tangent ensemble framework, connecting neural network training to Bayesian ensemble methods and providing a new perspective on continual learning.
Findings
Neural networks can be viewed as ensembles of neural tangent experts.
Posterior updates of experts are equivalent to scaled SGD steps.
Networks outside the lazy regime act as adaptive ensembles improving over time.
Abstract
A natural strategy for continual learning is to weigh a Bayesian ensemble of fixed functions. This suggests that if a (single) neural network could be interpreted as an ensemble, one could design effective algorithms that learn without forgetting. To realize this possibility, we observe that a neural network classifier with N parameters can be interpreted as a weighted ensemble of N classifiers, and that in the lazy regime limit these classifiers are fixed throughout learning. We call these classifiers the neural tangent experts and show they output valid probability distributions over the labels. We then derive the likelihood and posterior probability of each expert given past data. Surprisingly, the posterior updates for these experts are equivalent to a scaled and projected form of stochastic gradient descent (SGD) over the network weights. Away from the lazy regime, networks can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
