Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
YuXiang Kong, JunFeng Hou, Jian Tang, Bingqing Zhu, Jicheng Zhang, Shaofei Xue

TL;DR
This paper introduces a scalable two-stage framework that enhances large language model-based ASR by retrieving hotwords and fine-tuning with reinforcement learning, significantly improving hotword recognition without sacrificing overall transcription quality.
Contribution
The work presents a novel integration of hotword retrieval with LLM-ASR adaptation using a contrastive model and reinforcement learning, addressing large-vocabulary biasing challenges.
Findings
Significant reduction in keyword error rate for hotwords.
Maintains sentence accuracy on general ASR benchmarks.
Effective large-vocabulary contextual biasing demonstrated.
Abstract
Large language model (LLM)-based automatic speech recognition (ASR) has recently achieved strong performance across diverse tasks, yet contextual biasing for named entities and hotwords under large vocabularies remains challenging. In this work, we propose a scalable two-stage framework that integrates hotword retrieval with LLM-ASR adaptation. First, we extend the Global-Local Contrastive Language-Audio pre-trained model (GLCLAP) to retrieve a compact top-k set of hotword candidates from a large vocabulary via robustness-aware data augmentation and fuzzy matching. Second, we inject the retrieved candidates as textual prompts into an LLM-ASR model and fine-tune it with Generative Rejection-Based Policy Optimization (GRPO), using a task-driven reward that jointly optimizes hotword recognition and overall transcription accuracy. Experiments on hotword-focused test sets show substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
