ASR Error Correction with Constrained Decoding on Operation Prediction

Jingyuan Yang; Rongjun Li; Wei Peng

arXiv:2208.04641·cs.CL·August 10, 2022

ASR Error Correction with Constrained Decoding on Operation Prediction

Jingyuan Yang, Rongjun Li, Wei Peng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a constrained decoding method for ASR error correction that predicts correction operations to reduce latency and improve inference speed without sacrificing accuracy, supported by experiments on public datasets.

Contribution

It proposes a novel operation prediction-based correction method with a predictor module, significantly reducing decoding latency while maintaining accuracy, and releases a benchmark dataset for ASR correction.

Findings

01

Inference speed increased by 3.4 to 5.7 times.

02

WER reduced by up to 1.69%.

03

Effective on multiple datasets.

Abstract

Error correction techniques remain effective to refine outputs from automatic speech recognition (ASR) models. Existing end-to-end error correction methods based on an encoder-decoder architecture process all tokens in the decoding phase, creating undesirable latency. In this paper, we propose an ASR error correction method utilizing the predictions of correction operations. More specifically, we construct a predictor between the encoder and the decoder to learn if a token should be kept ("K"), deleted ("D"), or changed ("C") to restrict decoding to only part of the input sequence embeddings (the "C" tokens) for fast inference. Experiments on three public datasets demonstrate the effectiveness of the proposed approach in reducing the latency of the decoding process in ASR correction. It enhances the inference speed by at least three times (3.4 and 5.7 times) while maintaining the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangjingyuan/constdecoder
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings