Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
Guan-Ting Lin, Wei-Ping Huang, Hung-yi Lee

TL;DR
This paper introduces a continual test-time adaptation framework for end-to-end speech recognition that effectively handles domain shifts and noisy data, outperforming existing methods without needing domain boundary labels.
Contribution
It proposes a novel Fast-slow TTA framework and a dynamic reset strategy, advancing continual TTA for ASR to improve robustness on multi-domain noisy speech.
Findings
Outperforms non-continual and continual TTA baselines on noisy ASR datasets.
Effectively detects domain shifts and resets models without domain boundary labels.
Enhances robustness to multi-domain noisy speech in real-world scenarios.
Abstract
Deep Learning-based end-to-end Automatic Speech Recognition (ASR) has made significant strides but still struggles with performance on out-of-domain samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, which limits cross-sample knowledge learning compared to continual TTA. In this work, we first propose a Fast-slow TTA framework for ASR that leverages the advantage of continual and non-continual TTA. Following this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR. To enhance DSUTA robustness for time-varying data, we design a dynamic reset strategy to automatically detect domain shifts and reset the model, making it more effective at handling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
