CTC-Assisted LLM-Based Contextual ASR

Guanrou Yang; Ziyang Ma; Zhifu Gao; Shiliang Zhang; Xie Chen

arXiv:2411.06437·eess.AS·November 12, 2024

CTC-Assisted LLM-Based Contextual ASR

Guanrou Yang, Ziyang Ma, Zhifu Gao, Shiliang Zhang, Xie Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a CTC-assisted LLM-based contextual ASR model with an efficient filtering algorithm, significantly improving recognition accuracy of rare words and long-tail vocabulary in speech recognition tasks.

Contribution

It proposes a novel CTC-assisted approach combined with filtering to enhance large language model-based ASR for rare word recognition.

Findings

01

Achieves WER/B-WER of 1.27%/3.67% on Librispeech test-clean

02

Maintains high performance with 2000 biasing words

03

Surpasses existing models in recognizing rare and long-tail words

Abstract

Contextual ASR or hotword customization holds substantial practical value. Despite the impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) systems, they often face challenges in accurately recognizing rare words. Typical E2E contextual ASR models commonly feature complex architectures and decoding mechanisms, limited in performance and susceptible to interference from distractor words. With large language model (LLM)-based ASR models emerging as the new mainstream, we propose a CTC-Assisted LLM-Based Contextual ASR model with an efficient filtering algorithm. By using coarse CTC decoding results to filter potential relevant hotwords and incorporating them into LLM prompt input, our model attains WER/B-WER of 1.27%/3.67% and 2.72%/8.02% on the Librispeech test-clean and test-other sets targeting on recognizing rare long-tail words, demonstrating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

X-LANCE/SLAM-LLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Computational Techniques and Applications · Speech Recognition and Synthesis