Mixture-of-Variational-Experts for Continual Learning
Heinke Hihn, Daniel A. Braun

TL;DR
This paper introduces a hierarchical information-theoretic optimality principle for continual learning, leading to a Mixture-of-Variational-Experts layer that mitigates forgetting across various learning paradigms.
Contribution
It proposes a novel optimality principle and a neural network layer that reduces forgetting in continual learning across multiple problem types.
Findings
Competitive performance in continual supervised learning
Effective in continual reinforcement learning
General formulation applicable to diverse learning tasks
Abstract
One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We discuss this principle from a Bayesian perspective and show its connections to previous approaches to CL. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Due to the general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsConvolution · Gated Linear Unit · 1x1 Convolution · Gated Convolution · Gated Convolution Network · Variational Inference · Deep Ensembles · Soft Actor Critic
