Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation   Network

Yehoshua Dissen; Shiry Yonash; Israel Cohen; Joseph Keshet

arXiv:2406.18928·cs.SD·June 28, 2024

Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network

Yehoshua Dissen, Shiry Yonash, Israel Cohen, Joseph Keshet

PDF

Open Access 1 Repo

TL;DR

This paper introduces a front-end adaptation network that enhances the robustness of ASR systems like Whisper against packet loss, significantly reducing word error rates without compromising original performance.

Contribution

It presents a novel front-end adaptation network trained to recover corrupted speech inputs, improving ASR robustness in noisy and packet-loss conditions.

Findings

01

Reduces word error rate across multiple domains and languages.

02

Maintains original ASR model performance with minimal impact.

03

Demonstrates effectiveness in challenging acoustic environments.

Abstract

In the realm of automatic speech recognition (ASR), robustness in noisy environments remains a significant challenge. Recent ASR models, such as Whisper, have shown promise, but their efficacy in noisy conditions can be further enhanced. This study is focused on recovering from packet loss to improve the word error rate (WER) of ASR models. We propose using a front-end adaptation network connected to a frozen ASR model. The adaptation network is trained to modify the corrupted input spectrum by minimizing the criteria of the ASR model in addition to an enhancement loss function. Our experiments demonstrate that the adaptation network, trained on Whisper's criteria, notably reduces word error rates across domains and languages in packet-loss scenarios. This improvement is achieved with minimal affect to Whisper model's foundational performance, underscoring our method's practicality and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlspeech/whisperdenoiser
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems