Tied & Reduced RNN-T Decoder

Rami Botros (1); Tara N. Sainath (1); Robert David (1); Emmanuel; Guzman (1); Wei Li (1); Yanzhang He (1) ((1) Google Inc. USA)

arXiv:2109.07513·cs.CL·September 17, 2021

Tied & Reduced RNN-T Decoder

Rami Botros (1), Tara N. Sainath (1), Robert David (1), Emmanuel, Guzman (1), Wei Li (1), Yanzhang He (1) ((1) Google Inc. USA)

PDF

TL;DR

This paper introduces a simplified, smaller RNN-T decoder with weight tying and EMBR training, achieving a 90% reduction in parameters without loss in recognition accuracy, suitable for on-device speech recognition.

Contribution

It proposes a novel, lightweight RNN-T decoder design using weighted averaging and weight tying, combined with EMBR training, to drastically reduce model size while maintaining performance.

Findings

01

Decoder size reduced from 23M to 2M parameters

02

Recognition accuracy remains unchanged with the new design

03

Efficient on-device speech recognition enabled by smaller model

Abstract

Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.