Improving Distinction between ASR Errors and Speech Disfluencies with   Feature Space Interpolation

Seongmin Park; Dongchan Shin; Sangyoun Paik; Subong Choi; Alena; Kazakova; Jihwa Lee

arXiv:2108.01812·cs.CL·August 5, 2021·1 cites

Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation

Seongmin Park, Dongchan Shin, Sangyoun Paik, Subong Choi, Alena, Kazakova, Jihwa Lee

PDF

Open Access

TL;DR

This paper introduces a feature space interpolation method to enhance ASR error detection by reducing confusion caused by speech disfluencies, improving detection accuracy across multiple languages and systems.

Contribution

It proposes a novel mixup-based approach in feature space to improve error detection and robustness of language models against disfluencies in ASR post-processing.

Findings

01

Improves ASR error detection F1 scores

02

Reduces false positives on disfluencies

03

Effective across multiple languages and ASR systems

Abstract

Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing. While error detection systems often take advantage of statistical language archetypes captured by LMs, at times the pretrained knowledge can hinder error detection performance. For instance, presence of speech disfluencies might confuse the post-processing system into tagging disfluent but accurate transcriptions as ASR errors. Such confusion occurs because both error detection and disfluency detection tasks attempt to identify tokens at statistically unlikely positions. This paper proposes a scheme to improve existing LM-based ASR error detection systems, both in terms of detection scores and resilience to such distracting auxiliary tasks. Our approach adopts the popular mixup method in text feature space and can be utilized with any black-box…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsMixup