RNN Architecture Learning with Sparse Regularization

Jesse Dodge; Roy Schwartz; Hao Peng; Noah A. Smith

arXiv:1909.03011·cs.CL·September 9, 2019

RNN Architecture Learning with Sparse Regularization

Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith

PDF

Open Access 1 Repo

TL;DR

This paper introduces a sparse regularization method for rational RNNs that reduces model size significantly while maintaining performance, enhancing interpretability and efficiency in NLP tasks.

Contribution

The authors develop a group lasso-based structure learning approach for rational RNNs, enabling substantial parameter reduction without performance loss.

Findings

01

Models with over 90% weight pruning perform comparably to dense models.

02

Sparse rational RNNs are more interpretable and easier to visualize.

03

The method is effective across multiple sentiment analysis datasets with different embeddings.

Abstract

Neural models for NLP typically use large numbers of parameters to reach state-of-the-art performance, which can lead to excessive memory usage and increased runtime. We present a structure learning method for learning sparse, parameter-efficient NLP models. Our method applies group lasso to rational RNNs (Peng et al., 2018), a family of models that is closely connected to weighted finite-state automata (WFSAs). We take advantage of rational RNNs' natural grouping of the weights, so the group lasso penalty directly removes WFSA states, substantially reducing the number of parameters in the model. Our experiments on a number of sentiment analysis datasets, using both GloVe and BERT embeddings, show that our approach learns neural structures which have fewer parameters without sacrificing performance relative to parameter-rich baselines. Our method also highlights the interpretable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dodgejesse/sparsifying_regularizers_for_RRNNs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Topic Modeling · Natural Language Processing Techniques

MethodsPruning · Linear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece