E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, Ting Dang

TL;DR
E-BATS introduces an efficient, backpropagation-free test-time adaptation framework for speech models, significantly improving accuracy and reducing memory usage in noisy, real-world acoustic conditions.
Contribution
It is the first to tailor backpropagation-free TTA specifically for speech models, combining lightweight prompts, multi-scale loss, and stable adaptation mechanisms.
Findings
Achieves 4.1%-13.5% accuracy improvements over baselines.
Saves 2.0-6.4 times GPU memory compared to backpropagation methods.
Demonstrates robustness across diverse noisy speech datasets.
Abstract
Speech Foundation Models encounter significant performance degradation when deployed in real-world scenarios involving acoustic domain shifts, such as background noise and speaker accents. Test-time adaptation (TTA) has recently emerged as a viable strategy to address such domain shifts at inference time without requiring access to source data or labels. However, existing TTA approaches, particularly those relying on backpropagation, are memory-intensive, limiting their applicability in speech tasks and resource-constrained settings. Although backpropagation-free methods offer improved efficiency, existing ones exhibit poor accuracy. This is because they are predominantly developed for vision tasks, which fundamentally differ from speech task formulations, noise characteristics, and model architecture, posing unique transferability challenges. In this paper, we introduce E-BATS, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
