Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa, Takashi Masuko, Toru Taniguchi

TL;DR
This paper presents a novel LLM-based speech recognition system that uses contextual keywords in prompts to improve transcription accuracy for rare and ambiguous words, leveraging a decoder-only architecture with a pre-trained Whisper encoder.
Contribution
It introduces a method to incorporate contextual keywords into an LLM-based ASR system without architectural changes, enhancing recognition of challenging words.
Findings
Significant improvement in recognizing rare words.
Effective use of prompts for contextualization.
No need for model architecture modifications.
Abstract
We develop a large language model (LLM) based automatic speech recognition (ASR) system that can be contextualized by providing keywords as prior information in text prompts. We adopt decoder-only architecture and use our in-house LLM, PLaMo-100B, pre-trained from scratch using datasets dominated by Japanese and English texts as the decoder. We adopt a pre-trained Whisper encoder as an audio encoder, and the audio embeddings from the audio encoder are projected to the text embedding space by an adapter layer and concatenated with text embeddings converted from text prompts to form inputs to the decoder. By providing keywords as prior information in the text prompts, we can contextualize our LLM-based ASR system without modifying the model architecture to transcribe ambiguous words in the input audio accurately. Experimental results demonstrate that providing keywords to the decoder can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
MethodsAdapter
