Towards Binary-Valued Gates for Robust LSTM Training

Zhuohan Li; Di He; Fei Tian; Wei Chen; Tao Qin; Liwei Wang; Tie-Yan; Liu

arXiv:1806.02988·cs.LG·June 11, 2018·37 cites

Towards Binary-Valued Gates for Robust LSTM Training

Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, Tie-Yan, Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel training method for LSTMs that encourages gates to be binary-valued, improving interpretability and enabling effective model compression without performance loss.

Contribution

It proposes a new approach to train LSTM gates towards binary values, enhancing interpretability and facilitating compression while maintaining or improving performance.

Findings

01

Binary-valued gates improve interpretability.

02

Model compression via low-rank and low-precision approximations is effective.

03

Performance remains comparable or better despite gate restrictions.

Abstract

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhuohan123/g2-lstm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques