HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun,, Yiran Zhong

TL;DR
HGRN2 introduces an outer product-based state expansion to Gated Linear RNNs, significantly increasing their expressiveness and efficiency without extra parameters, leading to improved language modeling performance.
Contribution
The paper proposes a novel state expansion mechanism for HGRN, enhancing its capacity and interpretability while maintaining parameter efficiency.
Findings
HGRN2 outperforms original HGRN across various benchmarks.
The state expansion enables linear attention interpretation.
HGRN2 achieves competitive results with other recurrent models.
Abstract
Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and competitive with other recurrent models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Label Smoothing · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Residual Connection · Multi-Head Attention · Adam · Dropout
