Mixture-of-Variational-Experts for Continual Learning

Heinke Hihn; Daniel A. Braun

arXiv:2110.12667·cs.LG·March 2, 2022

Mixture-of-Variational-Experts for Continual Learning

Heinke Hihn, Daniel A. Braun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hierarchical information-theoretic optimality principle for continual learning, leading to a Mixture-of-Variational-Experts layer that mitigates forgetting across various learning paradigms.

Contribution

It proposes a novel optimality principle and a neural network layer that reduces forgetting in continual learning across multiple problem types.

Findings

01

Competitive performance in continual supervised learning

02

Effective in continual reinforcement learning

03

General formulation applicable to diverse learning tasks

Abstract

One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We discuss this principle from a Bayesian perspective and show its connections to previous approaches to CL. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Due to the general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hhihn/HVCL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsConvolution · Gated Linear Unit · 1x1 Convolution · Gated Convolution · Gated Convolution Network · Variational Inference · Deep Ensembles · Soft Actor Critic