RNN Architecture Learning with Sparse Regularization
Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith

TL;DR
This paper introduces a sparse regularization method for rational RNNs that reduces model size significantly while maintaining performance, enhancing interpretability and efficiency in NLP tasks.
Contribution
The authors develop a group lasso-based structure learning approach for rational RNNs, enabling substantial parameter reduction without performance loss.
Findings
Models with over 90% weight pruning perform comparably to dense models.
Sparse rational RNNs are more interpretable and easier to visualize.
The method is effective across multiple sentiment analysis datasets with different embeddings.
Abstract
Neural models for NLP typically use large numbers of parameters to reach state-of-the-art performance, which can lead to excessive memory usage and increased runtime. We present a structure learning method for learning sparse, parameter-efficient NLP models. Our method applies group lasso to rational RNNs (Peng et al., 2018), a family of models that is closely connected to weighted finite-state automata (WFSAs). We take advantage of rational RNNs' natural grouping of the weights, so the group lasso penalty directly removes WFSA states, substantially reducing the number of parameters in the model. Our experiments on a number of sentiment analysis datasets, using both GloVe and BERT embeddings, show that our approach learns neural structures which have fewer parameters without sacrificing performance relative to parameter-rich baselines. Our method also highlights the interpretable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Topic Modeling · Natural Language Processing Techniques
MethodsPruning · Linear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
