Trace norm regularization and faster inference for embedded speech recognition RNNs
Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad, Shoeybi

TL;DR
This paper introduces trace norm regularization for low-rank matrix approximation and optimized kernels for faster inference in embedded speech recognition RNNs, achieving significant speedups and efficient compression.
Contribution
It presents a novel trace norm regularization technique for training low-rank matrices and optimized kernels for ARM processors, improving speed and compression in embedded speech recognition models.
Findings
Achieved 3x to 7x speedups over gemmlowp library.
Demonstrated effective low-rank compression with good accuracy trade-offs.
Enabled faster training and inference for large neural network layers.
Abstract
We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, we introduce and study a trace norm regularization technique for training low rank factored versions of matrix multiplications. Compared to standard low rank training, we show that our method leads to good accuracy versus number of parameter trade-offs and can be used to speed up training of large models. For speedup, we enable faster inference on ARM processors through new open sourced kernels optimized for small batch sizes, resulting in 3x to 7x speed ups over the widely used gemmlowp library. Beyond LVCSR, we expect our techniques and kernels to be more generally applicable to embedded neural networks with large fully connected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
