Structured Sparsification of Gated Recurrent Neural Networks

Ekaterina Lobacheva; Nadezhda Chirkova; Alexander Markovich; Dmitry; Vetrov

arXiv:1911.05585·cs.LG·November 14, 2019

Structured Sparsification of Gated Recurrent Neural Networks

Ekaterina Lobacheva, Nadezhda Chirkova, Alexander Markovich, Dmitry, Vetrov

PDF

TL;DR

This paper introduces a novel sparsification method for gated recurrent neural networks, including LSTMs, which simplifies their structure by sparsifying weights, neurons, and gate preactivations, leading to improved compression and task-specific gate structures.

Contribution

It extends existing sparsification techniques to gated RNNs by including gate preactivation sparsification, resulting in more efficient models with task-dependent structures.

Findings

01

Gate sparsity varies with the task.

02

The method improves neuron-wise compression.

03

Simplifies LSTM structure without significant performance loss.

Abstract

Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e.g. neurons. We adjust the existing sparsification approaches to the gated recurrent architectures. Specifically, in addition to the sparsification of weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies LSTM structure. We test our approach on the text classification and language modeling tasks. We observe that the resulting structure of gate sparsity depends on the task and connect the learned structure to the specifics of the particular tasks. Our method also improves neuron-wise compression of the model in most of the tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest · Sigmoid Activation · Tanh Activation · Long Short-Term Memory