Semi-tied Units for Efficient Gating in LSTM and Highway Networks
Chao Zhang, Philip Woodland

TL;DR
This paper introduces semi-tied units (STUs), a parameter-sharing technique for gating in LSTMs and highway networks that significantly reduces computational costs while maintaining performance.
Contribution
It proposes a novel semi-tied unit approach that shares weight matrices across units and uses scaling factors, improving efficiency in gating mechanisms.
Findings
Reduces calculation and storage costs by up to four times.
Maintains similar word error rates as original models.
Effective in speech recognition tasks.
Abstract
Gating is a key technique used for integrating information from multiple sources by long short-term memory (LSTM) models and has recently also been applied to other models such as the highway network. Although gating is powerful, it is rather expensive in terms of both computation and storage as each gating unit uses a separate full weight matrix. This issue can be severe since several gates can be used together in e.g. an LSTM cell. This paper proposes a semi-tied unit (STU) approach to solve this efficiency issue, which uses one shared weight matrix to replace those in all the units in the same layer. The approach is termed "semi-tied" since extra parameters are used to separately scale each of the shared output values. These extra scaling factors are associated with the network activation functions and result in the use of parameterised sigmoid, hyperbolic tangent, and rectified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsHighway networks · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
