Continuous Learning in a Hierarchical Multiscale Neural Network
Thomas Wolf, Julien Chaumond, Clement Delangue

TL;DR
This paper introduces a hierarchical multi-scale language model that encodes dependencies at multiple time scales using a continuous learning framework with meta-learning and elastic weights consolidation to prevent forgetting.
Contribution
It presents a novel hierarchical neural network architecture that integrates meta-learning and elastic weights consolidation for continuous multi-scale sequence encoding.
Findings
Effective encoding of multi-scale dependencies demonstrated.
Prevents catastrophic forgetting in continuous learning.
Hierarchical model outperforms baseline methods.
Abstract
We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion. We use elastic weights consolidation as a higher-level to prevent catastrophic forgetting in our continuous learning framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
