Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer
Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu,, Pascale Fung

TL;DR
This paper introduces the low-rank transformer (LRT), a memory-efficient neural network architecture that reduces parameters and inference time for end-to-end speech recognition while improving accuracy.
Contribution
The paper presents a novel low-rank transformer architecture that significantly decreases model size and speeds up training and inference without external data.
Findings
Reduces parameters by over 50%
Speeds up inference by 1.35x
Achieves lower error rates than baseline models
Abstract
Highly performing deep neural networks come at the cost of computational complexity that limits their practicality for deployment on portable devices. We propose the low-rank transformer (LRT), a memory-efficient and fast neural architecture that significantly reduces the parameters and boosts the speed of training and inference for end-to-end speech recognition. Our approach reduces the number of parameters of the network by more than 50% and speeds up the inference time by around 1.35x compared to the baseline transformer model. The experiments show that our LRT model generalizes better and yields lower error rates on both validation and test sets compared to an uncompressed transformer model. The LRT model outperforms those from existing works on several datasets in an end-to-end setting without using an external language model or acoustic data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?
