TL;DR
This paper introduces a lightweight recurrent network (LRN) that improves computational efficiency by shifting heavy computations outside the recurrence, maintaining performance across NLP tasks.
Contribution
The paper proposes a novel LRN architecture that reduces computational complexity by externalizing parameter calculations, closely linking it with self-attention mechanisms.
Findings
LRN achieves superior efficiency compared to traditional recurrent networks.
LRN maintains comparable performance on six NLP tasks.
Extensive experiments validate the effectiveness of LRN as a drop-in replacement.
Abstract
Recurrent networks have achieved great success on various sequential tasks with the assistance of complex recurrent units, but suffer from severe computational inefficiency due to weak parallelization. One direction to alleviate this issue is to shift heavy computations outside the recurrence. In this paper, we propose a lightweight recurrent network, or LRN. LRN uses input and forget gates to handle long-range dependencies as well as gradient vanishing and explosion, with all parameter related calculations factored outside the recurrence. The recurrence in LRN only manipulates the weight assigned to each token, tightly connecting LRN with self-attention networks. We apply LRN as a drop-in replacement of existing recurrent units in several neural sequential models. Extensive experiments on six NLP tasks show that LRN yields the best running efficiency with little or no loss in model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Time Series Analysis and Forecasting · Natural Language Processing Techniques
