Deep Shallow Fusion for RNN-T Personalization

Duc Le; Gil Keren; Julian Chan; Jay Mahadeokar; Christian Fuegen,; Michael L. Seltzer

arXiv:2011.07754·cs.CL·November 17, 2020·1 cites

Deep Shallow Fusion for RNN-T Personalization

Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen,, Michael L. Seltzer

PDF

Open Access

TL;DR

This paper introduces novel deep fusion techniques to enhance RNN-T speech recognition models' ability to personalize, especially for rare words and entities, achieving significant WER improvements over baseline models.

Contribution

The work presents new methods for RNN-T personalization, including modeling rare WordPieces, integrating external info, and deep fusion with personalized language models.

Findings

01

Achieved 15.4%-34.5% relative WER reduction.

02

Enhanced recognition of rare words and entities.

03

Close gap with hybrid systems on biasing tasks.

Abstract

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks. However, these models are more challenging to personalize compared to traditional hybrid systems due to the lack of external language models and difficulties in recognizing rare long-tail words, specifically entity names. In this work, we present novel techniques to improve RNN-T's ability to model rare WordPieces, infuse extra information into the encoder, enable the use of alternative graphemic pronunciations, and perform deep fusion with personalized language models for more robust biasing. We show that these combined techniques result in 15.4%-34.5% relative Word Error Rate improvement compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling