Continuous Learning in a Hierarchical Multiscale Neural Network

Thomas Wolf; Julien Chaumond; Clement Delangue

arXiv:1805.05758·cs.CL·May 16, 2018

Continuous Learning in a Hierarchical Multiscale Neural Network

Thomas Wolf, Julien Chaumond, Clement Delangue

PDF

TL;DR

This paper introduces a hierarchical multi-scale language model that encodes dependencies at multiple time scales using a continuous learning framework with meta-learning and elastic weights consolidation to prevent forgetting.

Contribution

It presents a novel hierarchical neural network architecture that integrates meta-learning and elastic weights consolidation for continuous multi-scale sequence encoding.

Findings

01

Effective encoding of multi-scale dependencies demonstrated.

02

Prevents catastrophic forgetting in continuous learning.

03

Hierarchical model outperforms baseline methods.

Abstract

We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion. We use elastic weights consolidation as a higher-level to prevent catastrophic forgetting in our continuous learning framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.