End-to-end contextual asr based on posterior distribution adaptation for   hybrid ctc/attention system

Zhengyi Zhang; Pan Zhou

arXiv:2202.09003·cs.CL·February 21, 2022·1 cites

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

Zhengyi Zhang, Pan Zhou

PDF

Open Access

TL;DR

This paper introduces a novel contextual bias attention module for end-to-end speech recognition models, significantly improving recognition of infrequent proper nouns while maintaining overall performance.

Contribution

It proposes a CBA module that adapts posterior distributions of CTC and attention decoders based on preloaded bias phrases, enhancing contextual phrase recognition.

Findings

01

15% to 28% improvement in bias phrase recall

02

Minimal 1.7% performance degradation on general tests

03

Effective recognition of infrequent proper nouns

Abstract

End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model. Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns. In this work, we propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases. Specifically, CBA utilizes the context vector of source attention in decoder to attend to a specific bias embedding. Jointly learned with the basic AED parameters, CBA can tell the model when and where to bias its output probability distribution. At inference stage, a list of bias phrases is preloaded and we adapt the posterior distributions of both CTC and attention decoder according to the attended bias phrase of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling