Fast Contextual Adaptation with Neural Associative Memory for On-Device   Personalized Speech Recognition

Tsendsuren Munkhdalai; Khe Chai Sim; Angad Chandorkar; Fan Gao; Mason; Chua; Trevor Strohman; Fran\c{c}oise Beaufays

arXiv:2110.02220·eess.AS·October 8, 2021

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason, Chua, Trevor Strohman, Fran\c{c}oise Beaufays

PDF

Open Access

TL;DR

This paper introduces a decoder-agnostic, end-to-end neural associative memory approach for fast on-device personalized speech recognition, significantly improving recognition accuracy over traditional re-scoring methods.

Contribution

The work presents a novel neural associative memory model enabling rapid, decoder-agnostic contextual adaptation for on-device speech recognition personalization.

Findings

01

12% relative WER reduction

02

15.7% entity mention F1-score improvement

03

Effective on-device personalization simulation

Abstract

Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization. Our on-device simulation experiments demonstrate that the proposed approach outperforms the traditional re-scoring technique by 12% relative WER and 15.7% entity mention specific F1-score in a continues personalization scenario.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems