Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network
Yehoshua Dissen, Shiry Yonash, Israel Cohen, Joseph Keshet

TL;DR
This paper introduces a front-end adaptation network that enhances the robustness of ASR systems like Whisper against packet loss, significantly reducing word error rates without compromising original performance.
Contribution
It presents a novel front-end adaptation network trained to recover corrupted speech inputs, improving ASR robustness in noisy and packet-loss conditions.
Findings
Reduces word error rate across multiple domains and languages.
Maintains original ASR model performance with minimal impact.
Demonstrates effectiveness in challenging acoustic environments.
Abstract
In the realm of automatic speech recognition (ASR), robustness in noisy environments remains a significant challenge. Recent ASR models, such as Whisper, have shown promise, but their efficacy in noisy conditions can be further enhanced. This study is focused on recovering from packet loss to improve the word error rate (WER) of ASR models. We propose using a front-end adaptation network connected to a frozen ASR model. The adaptation network is trained to modify the corrupted input spectrum by minimizing the criteria of the ASR model in addition to an enhancement loss function. Our experiments demonstrate that the adaptation network, trained on Whisper's criteria, notably reduces word error rates across domains and languages in packet-loss scenarios. This improvement is achieved with minimal affect to Whisper model's foundational performance, underscoring our method's practicality and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
