Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction

Sashi Novitasari; Takashi Fukuda; Kurata Gakuto; George Saon

arXiv:2604.12398·eess.AS·April 15, 2026

Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction

Sashi Novitasari, Takashi Fukuda, Kurata Gakuto, George Saon

PDF

5 Models

TL;DR

This paper improves speech recognition accuracy for rare bias words by leveraging acoustic cues and bias word position prediction in speech-aware LLMs, without requiring phonetic expertise or G2P tools.

Contribution

It introduces a phoneme-free contextual biasing method using acoustic cues and a multi-output bias word position predictor, enhancing robustness and accuracy.

Findings

01

Reduces bias word recognition errors by 16.3%

02

Improves out-of-domain recognition accuracy

03

Eliminates need for phonetic knowledge or G2P tools

Abstract

Speech-aware LLMs (SLLMs) have recently achieved state-of-the-art ASR performance; however, they still fail to accurately transcribe bias words that appear rarely or never in the training data. Contextual biasing mechanisms are commonly implemented by introducing a predefined bias word list into the model via a text prompt or additional module. For further improvement, predefined bias words can be paired with their phoneme representations as pronunciation cues. Typically, phoneme sequences are generated through a G2P system that covers the target languages and domains of the bias words. Therefore, when a compatible G2P system is unavailable, phoneme-assisted contextual biasing becomes difficult to perform. Moreover, manually adding accurate phoneme sequences requires advanced phonetic knowledge. In this paper, we explore contextual biasing in SLLM based on acoustic cues associated with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.