LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition
Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang, D. Yoo

TL;DR
This paper introduces LI-TTA, a novel test-time adaptation method for automatic speech recognition that incorporates linguistic insights via an external language model, improving performance under domain shifts.
Contribution
LI-TTA is the first approach to integrate linguistic information into TTA for ASR, combining acoustic and linguistic cues to enhance adaptation.
Findings
LI-TTA improves ASR performance across various domain shifts.
Incorporating linguistic corrections yields better adaptation than acoustic-only methods.
Extensive experiments validate the effectiveness of LI-TTA.
Abstract
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which enhances model performance by leveraging output prediction entropy minimization as a self-supervision signal. However, a key limitation of this self-supervision lies in its primary focus on acoustic features, with minimal attention to the linguistic properties of the input. To address this gap, we propose Language Informed Test-Time Adaptation (LI-TTA), which incorporates linguistic insights during TTA for ASR. LI-TTA integrates corrections from an external language model to merge linguistic with acoustic information by minimizing the CTC loss from the correction alongside the standard TTA loss. With extensive experiments, we show that LI-TTA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Connectionist Temporal Classification Loss · Focus
