Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

TL;DR
This paper introduces two zero-shot domain adaptation methods for speech recognition using large language models, leveraging domain-specific prompts to improve accuracy without additional training data.
Contribution
It proposes novel zero-shot domain adaptation techniques using LLaMA, a large language model, for speech recognition, avoiding the need for extensive target domain data.
Findings
Both methods reduce word error rates on out-of-domain datasets.
Deep LLM-fusion better recalls entities and OOV words.
Effective zero-shot adaptation with only one domain prompt.
Abstract
The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an encoder-decoder based ASR system. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep LLM-fusion has the advantage of better recall of entity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
