Combining Frame-Synchronous and Label-Synchronous Systems for Speech   Recognition

Qiujia Li; Chao Zhang; Philip C. Woodland

arXiv:2107.00764·eess.AS·July 5, 2021

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

Qiujia Li, Chao Zhang, Philip C. Woodland

PDF

Open Access 1 Repo

TL;DR

This paper introduces a two-pass speech recognition system that combines frame-synchronous and label-synchronous models, achieving significant WER reductions by leveraging their complementary strengths without extra data.

Contribution

It proposes a novel rescoring approach that integrates frame-synchronous and label-synchronous ASR systems, improving accuracy without additional training data.

Findings

01

Achieves up to 29% relative WER reduction on AMI dataset.

02

Attains up to 33% relative WER reduction on Switchboard and RT03 datasets.

03

Demonstrates the effectiveness of combining different ASR paradigms for improved performance.

Abstract

Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis. Frame-synchronous systems, such as traditional hidden Markov model systems, can easily incorporate existing knowledge and can support streaming ASR applications. Label-synchronous systems, based on attention-based encoder-decoder models, can jointly learn the acoustic and language information with a single model, which can be regarded as audio-grounded language models. In this paper, we propose rescoring the N-best hypotheses or lattices produced by a first-pass frame-synchronous system with a label-synchronous system in a second-pass. By exploiting the complementary modelling of the different approaches, the combined two-pass systems achieve competitive performance without using any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qiujiali/lattice-rescore
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing