Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition
Qiujia Li, Chao Zhang, Philip C. Woodland

TL;DR
This paper introduces ISCA, a novel speech recognition framework that combines traditional source-channel models with attention-based sequence-to-sequence models, leading to significant improvements in word error rate.
Contribution
The paper presents a new integrated framework that combines source-channel and attention-based models for speech recognition, enhancing performance by leveraging their complementary strengths.
Findings
Achieves up to 21% relative WER reduction over individual systems.
Outperforms alternative CTC and attention model combination by 13%.
Demonstrates effectiveness on the AMI meeting corpus.
Abstract
This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models. The traditional SC system framework includes hidden Markov models and connectionist temporal classification (CTC) based acoustic models, language models (LMs), and a decoding procedure based on a lexicon, whereas the end-to-end style attention-based system jointly models the whole process with a single model. By rescoring the hypotheses produced by traditional systems using end-to-end style systems based on an extended noisy source-channel model, ISCA allows structured knowledge to be easily incorporated via the SC-based model while exploiting the complementarity of the attention-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
