Trace norm regularization and faster inference for embedded speech   recognition RNNs

Markus Kliegl; Siddharth Goyal; Kexin Zhao; Kavya Srinet; Mohammad; Shoeybi

arXiv:1710.09026·cs.LG·February 7, 2018·6 cites

Trace norm regularization and faster inference for embedded speech recognition RNNs

Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad, Shoeybi

PDF

Open Access 1 Repo

TL;DR

This paper introduces trace norm regularization for low-rank matrix approximation and optimized kernels for faster inference in embedded speech recognition RNNs, achieving significant speedups and efficient compression.

Contribution

It presents a novel trace norm regularization technique for training low-rank matrices and optimized kernels for ARM processors, improving speed and compression in embedded speech recognition models.

Findings

01

Achieved 3x to 7x speedups over gemmlowp library.

02

Demonstrated effective low-rank compression with good accuracy trade-offs.

03

Enabled faster training and inference for large neural network layers.

Abstract

We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, we introduce and study a trace norm regularization technique for training low rank factored versions of matrix multiplications. Compared to standard low rank training, we show that our method leads to good accuracy versus number of parameter trade-offs and can be used to speed up training of large models. For speedup, we enable faster inference on ARM processors through new open sourced kernels optimized for small batch sizes, resulting in 3x to 7x speed ups over the widely used gemmlowp library. Beyond LVCSR, we expect our techniques and kernels to be more generally applicable to embedded neural networks with large fully connected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kexinzhao/farm
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings