Contextual Biasing for Streaming ASR via CTC-based Word Spotting

Kai-Chen Tsai; Tien-Hong Lo; Yun-Ting Sun; and Berlin Chen

arXiv:2605.18222·eess.AS·May 20, 2026

Contextual Biasing for Streaming ASR via CTC-based Word Spotting

Kai-Chen Tsai, Tien-Hong Lo, Yun-Ting Sun, and Berlin Chen

PDF

TL;DR

This paper introduces a novel streaming extension of CTC-based word spotting for real-time contextual biasing in ASR, enabling detection of domain-specific words with low latency without retraining models.

Contribution

It proposes a stateful token passing algorithm and incremental commitment mechanism to adapt offline CTC-WS for streaming ASR, improving real-time keyword detection and recognition accuracy.

Findings

01

Reduces overall WER in streaming ASR

02

Improves keyword F-score for domain-specific words

03

Maintains low latency with incremental emission strategy

Abstract

Contextual biasing is essential to improving the recognition of rare and domain-specific words in an automatic speech recognition (ASR) system. While numerous methods have been proposed in recent years, most of them focus on offline settings and do not explicitly address the challenges of streaming ASR. For example, CTC-based word spotting (CTC-WS) have demonstrated strong performance by directly detecting keywords from CTC log-probabilities, but they are limited to offline processing and require access to the full utterance. In This work, we present a streaming extension of CTC-WS for real-time contextual biasing. Our method maintains active keyword paths across audio chunks using a stateful token passing algorithm, enabling the detection of keywords that span multiple chunks. To ensure low latency and stable output, we introduce an incremental commitment mechanism that only emits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.