Decoupling Hierarchical Recurrent Neural Networks With Locally   Computable Losses

Asier Mujika; Felix Weissenberger; Angelika Steger

arXiv:1910.05245·cs.LG·October 14, 2019

Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses

Asier Mujika, Felix Weissenberger, Angelika Steger

PDF

Open Access

TL;DR

This paper introduces a method to decouple hierarchical RNN training by using locally computable losses, significantly reducing memory needs while maintaining learning performance across various tasks.

Contribution

It demonstrates that gradient propagation in deep HRNNs can be replaced with local losses, enabling efficient training on long sequences.

Findings

01

Local losses match the performance of TBPTT in various tasks.

02

Memory requirements are exponentially reduced with hierarchy depth.

03

Decoupling does not harm the network's ability to learn long-term dependencies.

Abstract

Learning long-term dependencies is a key long-standing challenge of recurrent neural networks (RNNs). Hierarchical recurrent neural networks (HRNNs) have been considered a promising approach as long-term dependencies are resolved through shortcuts up and down the hierarchy. Yet, the memory requirements of Truncated Backpropagation Through Time (TBPTT) still prevent training them on very long sequences. In this paper, we empirically show that in (deep) HRNNs, propagating gradients back from higher to lower levels can be replaced by locally computable losses, without harming the learning capability of the network, over a wide range of tasks. This decoupling by local losses reduces the memory requirements of training by a factor exponential in the depth of the hierarchy in comparison to standard TBPTT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning