Bayesian Sparsification of Gated Recurrent Neural Networks

Ekaterina Lobacheva; Nadezhda Chirkova; Dmitry Vetrov

arXiv:1812.05692·cs.LG·December 17, 2018·1 cites

Bayesian Sparsification of Gated Recurrent Neural Networks

Ekaterina Lobacheva, Nadezhda Chirkova, Dmitry Vetrov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Bayesian sparsification method for gated recurrent neural networks, including LSTMs, which reduces complexity, speeds up computation, and enhances interpretability by sparsifying weights, neurons, and gate preactivations.

Contribution

It extends Bayesian sparsification to gate preactivations in LSTMs, leading to more efficient, interpretable, and task-dependent sparse recurrent architectures.

Findings

01

Sparsification speeds up forward passes.

02

Gate preactivation sparsification improves model compression.

03

The resulting sparsity structure is interpretable and task-specific.

Abstract

Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. It makes some gates and information flow components constant, speeds up forward pass and improves compression. Moreover, the resulting structure of gate sparsity is interpretable and depends on the task. Code is available on github: https://github.com/tipt0p/SparseBayesianRNN

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tipt0p/SparseBayesianRNN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory