Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin, Songlin Yang, Yiran Zhong

TL;DR
This paper introduces Hierarchically Gated Recurrent Neural Networks (HGRN), a new linear RNN architecture with layered forget gates that improve long-term and short-term dependency modeling, demonstrating efficiency across multiple tasks.
Contribution
The paper proposes a novel layered gating mechanism in linear RNNs, enhancing their ability to model dependencies at different temporal scales.
Findings
HGRN outperforms traditional RNNs and transformers in language modeling.
HGRN achieves competitive results in image classification.
HGRN demonstrates strong performance on long-range dependency benchmarks.
Abstract
Transformers have surpassed RNNs in popularity due to their superior abilities in parallel training and long-term dependency modeling. Recently, there has been a renewed interest in using linear RNNs for efficient sequence modeling. These linear RNNs often employ gating mechanisms in the output of the linear recurrence layer while ignoring the significance of using forget gates within the recurrence. In this paper, we propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN), which includes forget gates that are lower bounded by a learnable value. The lower bound increases monotonically when moving up layers. This allows the upper layers to model long-term dependencies and the lower layers to model more local, short-term dependencies. Experiments on language modeling, image classification, and long-range arena benchmarks showcase the efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Bioinformatics
