Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses
Asier Mujika, Felix Weissenberger, Angelika Steger

TL;DR
This paper introduces a method to decouple hierarchical RNN training by using locally computable losses, significantly reducing memory needs while maintaining learning performance across various tasks.
Contribution
It demonstrates that gradient propagation in deep HRNNs can be replaced with local losses, enabling efficient training on long sequences.
Findings
Local losses match the performance of TBPTT in various tasks.
Memory requirements are exponentially reduced with hierarchy depth.
Decoupling does not harm the network's ability to learn long-term dependencies.
Abstract
Learning long-term dependencies is a key long-standing challenge of recurrent neural networks (RNNs). Hierarchical recurrent neural networks (HRNNs) have been considered a promising approach as long-term dependencies are resolved through shortcuts up and down the hierarchy. Yet, the memory requirements of Truncated Backpropagation Through Time (TBPTT) still prevent training them on very long sequences. In this paper, we empirically show that in (deep) HRNNs, propagating gradients back from higher to lower levels can be replaced by locally computable losses, without harming the learning capability of the network, over a wide range of tasks. This decoupling by local losses reduces the memory requirements of training by a factor exponential in the depth of the hierarchy in comparison to standard TBPTT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
