CTC-Assisted LLM-Based Contextual ASR
Guanrou Yang, Ziyang Ma, Zhifu Gao, Shiliang Zhang, Xie Chen

TL;DR
This paper introduces a CTC-assisted LLM-based contextual ASR model with an efficient filtering algorithm, significantly improving recognition accuracy of rare words and long-tail vocabulary in speech recognition tasks.
Contribution
It proposes a novel CTC-assisted approach combined with filtering to enhance large language model-based ASR for rare word recognition.
Findings
Achieves WER/B-WER of 1.27%/3.67% on Librispeech test-clean
Maintains high performance with 2000 biasing words
Surpasses existing models in recognizing rare and long-tail words
Abstract
Contextual ASR or hotword customization holds substantial practical value. Despite the impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) systems, they often face challenges in accurately recognizing rare words. Typical E2E contextual ASR models commonly feature complex architectures and decoding mechanisms, limited in performance and susceptible to interference from distractor words. With large language model (LLM)-based ASR models emerging as the new mainstream, we propose a CTC-Assisted LLM-Based Contextual ASR model with an efficient filtering algorithm. By using coarse CTC decoding results to filter potential relevant hotwords and incorporating them into LLM prompt input, our model attains WER/B-WER of 1.27%/3.67% and 2.72%/8.02% on the Librispeech test-clean and test-other sets targeting on recognizing rare long-tail words, demonstrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Computational Techniques and Applications · Speech Recognition and Synthesis
