Regularization and nonlinearities for neural language models: when are they needed?
Marius Pachitariu, Maneesh Sahani

TL;DR
This paper investigates when regularization and nonlinearities are necessary in neural language models, showing that simpler linear models can perform competitively with RNNs under certain conditions and emphasizing the importance of internal representations.
Contribution
It introduces the impulse-response LM (IRLM), a simplified linear RNN variant, and demonstrates its effectiveness and insights into the roles of regularization and nonlinearities in neural LMs.
Findings
IRLM achieves state-of-the-art on small datasets with regularization.
RNNs outperform IRLM on large datasets due to higher expressivity.
Long-context units significantly improve performance on sentence completion tasks.
Abstract
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successful word and character-level LMs. Why do they work so well, in particular better than linear neural LMs? Possible explanations are that RNNs have an implicitly better regularization or that RNNs have a higher capacity for storing patterns due to their nonlinearities or both. Here we argue for the first explanation in the limit of little training data and the second explanation for large amounts of text data. We show state-of-the-art performance on the popular and small Penn dataset when RNN LMs are regularized with random dropout. Nonetheless, we show even better performance from a simplified, much less expressive linear RNN model without off-diagonal entries in the recurrent matrix. We call this model an impulse-response LM (IRLM). Using random dropout, column normalization and annealed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
