Integrating Source-channel and Attention-based Sequence-to-sequence   Models for Speech Recognition

Qiujia Li; Chao Zhang; Philip C. Woodland

arXiv:1909.06614·eess.AS·October 2, 2019·1 cites

Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition

Qiujia Li, Chao Zhang, Philip C. Woodland

PDF

Open Access

TL;DR

This paper introduces ISCA, a novel speech recognition framework that combines traditional source-channel models with attention-based sequence-to-sequence models, leading to significant improvements in word error rate.

Contribution

The paper presents a new integrated framework that combines source-channel and attention-based models for speech recognition, enhancing performance by leveraging their complementary strengths.

Findings

01

Achieves up to 21% relative WER reduction over individual systems.

02

Outperforms alternative CTC and attention model combination by 13%.

03

Demonstrates effectiveness on the AMI meeting corpus.

Abstract

This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models. The traditional SC system framework includes hidden Markov models and connectionist temporal classification (CTC) based acoustic models, language models (LMs), and a decoding procedure based on a lexicon, whereas the end-to-end style attention-based system jointly models the whole process with a single model. By rescoring the hypotheses produced by traditional systems using end-to-end style systems based on an extended noisy source-channel model, ISCA allows structured knowledge to be easily incorporated via the SC-based model while exploiting the complementarity of the attention-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing