A Unifying Framework of Bilinear LSTMs
Mohit Rajpal, Bryan Kian Hsiang Low

TL;DR
This paper introduces a unifying bilinear LSTM framework that captures nonlinear feature interactions in sequence data, improving performance without increasing model parameters.
Contribution
It proposes a flexible framework balancing expressivity and parameter efficiency, unifying linear and bilinear LSTMs for sequence learning.
Findings
Outperforms linear LSTMs in language tasks
Maintains parameter count while increasing expressivity
Demonstrates broad applicability across sequence datasets
Abstract
This paper presents a novel unifying framework of bilinear LSTMs that can represent and utilize the nonlinear interaction of the input features present in sequence datasets for achieving superior performance over a linear LSTM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balanced by correspondingly trading off between the hidden state vector size vs. approximation quality of the weight matrix in the bilinear term so as to optimize the performance of our bilinear LSTM, while not incurring more parameters to be learned. We empirically evaluate the performance of our bilinear LSTM in several language-based sequence learning tasks to demonstrate its general applicability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Music and Audio Processing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
