Recurrent Additive Networks
Kenton Lee, Omer Levy, Luke Zettlemoyer

TL;DR
Recurrent Additive Networks (RANs) are a new type of gated RNN that use purely additive state updates, performing comparably to LSTMs on language modeling tasks, challenging assumptions about the necessity of non-linearities.
Contribution
This paper introduces RANs, a simple additive gated RNN architecture, and demonstrates their effectiveness, showing non-linearities may not be essential for certain language modeling tasks.
Findings
RAN states are weighted sums of input vectors.
RAN performs on par with LSTMs on benchmark language modeling.
Gates primarily determine the weights in the additive sums.
Abstract
We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates. At every time step, the new state is computed as a gated component-wise sum of the input and the previous state, without any of the non-linearities commonly used in RNN transition dynamics. We formally show that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums. Despite this relatively simple functional form, experiments demonstrate that RANs perform on par with LSTMs on benchmark language modeling problems. This result shows that many of the non-linear computations in LSTMs and related networks are not essential, at least for the problems we consider, and suggests that the gates are doing more of the computational work than previously understood.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Ferroelectric and Negative Capacitance Devices
