An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

Hang Lv; Zhehuai Chen; Hainan Xu; Daniel Povey; Lei Xie; Sanjeev; Khudanpur

arXiv:2103.09063·cs.SD·March 17, 2021

An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev, Khudanpur

PDF

Open Access

TL;DR

This paper presents an asynchronous WFST-based decoder for large vocabulary speech recognition that improves decoding speed by using a novel exploration and backfill approach, enabling more efficient pruning and handling complex data.

Contribution

It introduces an asynchronous dynamic decoder with a dual-front design that enhances decoding efficiency over standard on-the-fly composition methods.

Findings

01

Faster decoding performance compared to standard methods

02

More effective pruning during decoding

03

Acceleration increases with data complexity

Abstract

We introduce asynchronous dynamic decoder, which adopts an efficient A* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing