E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models

Jiaheng Dong; Hong Jia; Soumyajit Chatterjee; Abhirup Ghosh; James Bailey; Ting Dang

arXiv:2506.07078·cs.LG·February 24, 2026

E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models

Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, Ting Dang

PDF

Open Access 1 Video

TL;DR

E-BATS introduces an efficient, backpropagation-free test-time adaptation framework for speech models, significantly improving accuracy and reducing memory usage in noisy, real-world acoustic conditions.

Contribution

It is the first to tailor backpropagation-free TTA specifically for speech models, combining lightweight prompts, multi-scale loss, and stable adaptation mechanisms.

Findings

01

Achieves 4.1%-13.5% accuracy improvements over baselines.

02

Saves 2.0-6.4 times GPU memory compared to backpropagation methods.

03

Demonstrates robustness across diverse noisy speech datasets.

Abstract

Speech Foundation Models encounter significant performance degradation when deployed in real-world scenarios involving acoustic domain shifts, such as background noise and speaker accents. Test-time adaptation (TTA) has recently emerged as a viable strategy to address such domain shifts at inference time without requiring access to source data or labels. However, existing TTA approaches, particularly those relying on backpropagation, are memory-intensive, limiting their applicability in speech tasks and resource-constrained settings. Although backpropagation-free methods offer improved efficiency, existing ones exhibit poor accuracy. This is because they are predominantly developed for vision tasks, which fundamentally differ from speech task formulations, noise characteristics, and model architecture, posing unique transferability challenges. In this paper, we introduce E-BATS, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research