Continual Test-time Adaptation for End-to-end Speech Recognition on   Noisy Speech

Guan-Ting Lin; Wei-Ping Huang; Hung-yi Lee

arXiv:2406.11064·eess.AS·October 4, 2024

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech

Guan-Ting Lin, Wei-Ping Huang, Hung-yi Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a continual test-time adaptation framework for end-to-end speech recognition that effectively handles domain shifts and noisy data, outperforming existing methods without needing domain boundary labels.

Contribution

It proposes a novel Fast-slow TTA framework and a dynamic reset strategy, advancing continual TTA for ASR to improve robustness on multi-domain noisy speech.

Findings

01

Outperforms non-continual and continual TTA baselines on noisy ASR datasets.

02

Effectively detects domain shifts and resets models without domain boundary labels.

03

Enhances robustness to multi-domain noisy speech in real-world scenarios.

Abstract

Deep Learning-based end-to-end Automatic Speech Recognition (ASR) has made significant strides but still struggles with performance on out-of-domain samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, which limits cross-sample knowledge learning compared to continual TTA. In this work, we first propose a Fast-slow TTA framework for ASR that leverages the advantage of continual and non-continual TTA. Following this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR. To enhance DSUTA robustness for time-varying data, we design a dynamic reset strategy to automatically detect domain shifts and reset the model, making it more effective at handling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hhhaaahhhaa/asr-tta
pytorch

Videos

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing